15 datasets found
  1. Adult Social Care slides - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Aug 14, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2020). Adult Social Care slides - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/adult-social-care-slides
    Explore at:
    Dataset updated
    Aug 14, 2020
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    Unequal impact of COVID-19: BAME disproportionality Section 1 (slides 1 – 3): The Public Health England (PHE) review confirms the risk of mortality as a result of covid-19 by ethnicity. Data on access to care and emergency response has been taken from our local VCS partner feedback and indications from local data.) Data on the care workforce by ethnicity was taken from our local data and the Section 2 (Slides 4 – 7) covers demographic information on Black, Asian, and other or mixed ethnic people delivering direct care in the wider social care sector from the Skills for Care 2019 Social Care Workforce Review (note: factors that need to be considered are age, sex, underlying health conditions, ethnicity, and pregnancy.) Information on Camden’s ASC workforce was taken from GLA 2016-based Ethnic Group Projections - mid-2020). Demographic information on people receiving ASC support in Camden has been taken from our local service data. Section 3: (slides 8-15) sets out information on Adult Social Care activity during Covid-19 and looks at data relative to ethnicity including the ASC cohort of Camden’s shielded residents. (Service held data NOT official statistics including qualitative feedback from communities) Section 4: (Slides 16 – 18) shows information related to the Adult Social Care outcomes framework which has provided some information gathered before Covid-19 on the experiences of people who are BAME and in receipt of social care support in Camden.

  2. c

    Recurrent Breast Cancer: Histopathological and Hyperspectral Images Database...

    • cancerimagingarchive.net
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2025). Recurrent Breast Cancer: Histopathological and Hyperspectral Images Database [Dataset]. http://doi.org/10.7937/6kpy-yt49
    Explore at:
    envi, mrxs, and geojson, xlsx, n/aAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Sep 30, 2025
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    Abstract

    Multimodal data has emerged as a promising tool to integrate diverse information, offering a more comprehensive perspective. This study introduces the HistologyHSI-BC-Recurrence Database, the first publicly accessible multimodal dataset designed to advance distant recurrence prediction in breast cancer (BC). The dataset comprises 47 histopathological whole-slide images (WSIs), 677 hyperspectral (HS) images, and demographic and clinical data from 47 BC patients, of whom 22 (47%) experienced distant recurrence over a 12-year follow-up. Histopathological slides were digitized using a WSI scanner and annotated by expert pathologists, while HS images were acquired with a bright-field microscope and a HS camera. This dataset provides a promising resource for BC recurrence prediction and personalized treatment strategies by integrating histopathological WSIs, HS images, and demographic and clinical data.

    Introduction

    Breast cancer (BC) is the most common cancer in women and a leading cause of cancer-related deaths, with metastasis being the main cause of death. About one-third of BC patients develop metastasis, which can be regional or distant, and survival rates drop dramatically with distant metastasis. Despite progress in identifying biomarkers associated with metastasis, there is no consensus for their clinical use. Imaging methods, such as X-ray, ultrasound, and magnetic resonance imaging, play a key role in detection, but histopathological diagnosis is crucial for treatment decisions. Digital pathology, utilizing whole-slide images (WSIs) and machine learning, is transforming BC diagnostics, integrating clinical data to improve prognostic accuracy. Hyperspectral imaging (HSI), which combines spatial and spectral information, is emerging as a promising tool for BC detection and prognosis. However, high-quality datasets integrating WSIs, HS images, and clinical data are scarce. This study introduces the HistologyHSI-BC-Recurrence Database, which includes WSIs, HS images, and clinical data from 47 BC patients, aiming to predict recurrence due to distant metastasis. This multimodal dataset will help develop predictive models, enhance diagnostic accuracy, and support research in computational pathology, ultimately improving personalized treatment strategies for BC.

    Methods

    Subject Inclusion and Exclusion Criteria

    This dataset includes data from 47 patients diagnosed with invasive ductal carcinoma (IDC) between 2006 and 2015. Of these, 22 patients experienced recurrence due to distant metastasis within 12 years, while 25 patients did not. Inclusion criteria required a diagnosis of IDC, representative surgical biopsy, complete clinical and pathological data, and patient consent. Exclusion criteria involved receiving neoadjuvant treatment, regional recurrence rather than in distant organs, presence of distant metastases at diagnosis, or failure to meet inclusion criteria.

    Data Acquisition

    Histopathology WSIs

    Paraffin blocks of primary tumor biopsies with sufficient representative IDC tissue were obtained from the Biobank IISPV-Node Tortosa, Tarragona, Spain. The samples were processed in the Pathology Department, where 2 µm-thick sections were prepared from each paraffin block and stained according to the standard H&E staining protocol. The slides were sealed with coverslips using dibutylphthalate polystyrene xylene (DPX) mounting medium for subsequent digitization and HS microscopic image acquisition. The H&E-stained slides were digitized with a WSI scanner (Pannoramic 250 Flash III, 3DHISTECH Ltd., Budapest, Hungary) at 20× magnification (0.2433 µm/pixel) using MRXS image format.

    Demographic and clinical data

    The data process involved extracting information from clinical records, including demographic and clinical information (please refer to the HistologyHSI-BC-Recurrence-Clinical-Standardized-DataDictionary.xlsx)

    HS images

    The HS images were captured using a Hyperspec® VNIR A-Series pushbroom camera, which scans samples spatially and captures spectral data across 400-1,000 nm. The camera is paired with an Olympus BX-53 microscope and a scanning stage that ensures precise sample alignment. Calibration of the HS images is crucial to adjust for sensor response, light transmission, and source variation, achieved by normalizing pixel values using white and dark references. The system also generates synthetic RGB images for easier visualization of the data. In-house software facilitates sample navigation, synchronizes camera and microscope stage, and processes the data by removing noisy bands and generating calibrated cubes.

    Data Analysis

    WSIs were visualized using QuPath and anonymized with SlideMaster software. The quality of the histopathological slides was verified by pathologists, ensuring no artifacts were present due to tissue preparation or digitization. Pathologists manually annotated the images to differentiate between IDC, healthy tissue, and ductal carcinoma in situ (DCIS) using a color scheme (blue for IDC, green for healthy tissue, and red for DCIS). Annotations were initially made by one pathologist and then validated through a pairwise review with a second pathologist to ensure consistency and minimize inter-observer variability. Furthermore, regions of interest (ROIs) within these tissue types were identified and marked by yellow lines, for further HS imaging analysis.

    Usage Notes

    Data organization and naming conventions

    The database is divided into three main components:

    1. Clinical and demographic data (HistologyHSI-BC-Recurrence -Clinical-Standardized.xlsx)
    2. WSIs and corresponding tissue and ROI annotations (01_01_Histological_Images, 01_02_Tissue_Annotations, and 01_03_HSI_ROI_Annotations)
    3. HSI images (02_01_HSI_Images). The HSI data is stored in folders named according to the regular expression HSI_VNIR_{P}_{TT}_x10_C{CN}, where {P} represents the patient ID, {TT} indicates the tissue type (IDC, healthy, or DCIS), and {CN} is the capture number.

    Working with HSIs

    HSI data is typically stored in specialized formats like .hdr files paired with .dat or .raw files, representing a multidimensional data cube. Python and MATLAB are usually employed for processing these data. See the External Resources section below for example code. First, calibration is essential, followed by optional processing like spectral dimensionality reduction to reduce noise and computational costs (e.g., reducing 826 spectral bands to 275 by averaging neighboring bands). Normalization can also be performed when needed, scaling data to a range or adjusting to have a mean of 0 and standard deviation of 1. Additionally, removing the sample background, typically the white areas, is recommended for more accurate analysis.

    Recommendations for software that can be used to open the data

    Visualizing histopathology WSIs

    The authors suggest using QuPath software to open and analyze WSIs (MRXS format) and annotations (GeoJSON format). WSIs can be loaded via drag and drop or through the "File/Open" option. Annotations for tissue compartments (IDC, healthy, DCIS) and ROIs (yellow rectangles for HS capture) should be imported as GeoJSON files.

  3. Demographic and Health Survey 2022 - Ghana

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Jan 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghana Statistical Service (GSS) (2024). Demographic and Health Survey 2022 - Ghana [Dataset]. https://microdata.worldbank.org/index.php/catalog/6122
    Explore at:
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    Ghana Statistical Services
    Authors
    Ghana Statistical Service (GSS)
    Time period covered
    2022 - 2023
    Area covered
    Ghana
    Description

    Abstract

    The 2022 Ghana Demographic and Health Survey (2022 GDHS) is the seventh in the series of DHS surveys conducted by the Ghana Statistical Service (GSS) in collaboration with the Ministry of Health/Ghana Health Service (MoH/GHS) and other stakeholders, with funding from the United States Agency for International Development (USAID) and other partners.

    The primary objective of the 2022 GDHS is to provide up-to-date estimates of basic demographic and health indicators. Specifically, the GDHS collected information on: - Fertility levels and preferences, contraceptive use, antenatal and delivery care, maternal and child health, childhood mortality, childhood immunisation, breastfeeding and young child feeding practices, women’s dietary diversity, violence against women, gender, nutritional status of adults and children, awareness regarding HIV/AIDS and other sexually transmitted infections, tobacco use, and other indicators relevant for the Sustainable Development Goals - Haemoglobin levels of women and children - Prevalence of malaria parasitaemia (rapid diagnostic testing and thick slides for malaria parasitaemia in the field and microscopy in the lab) among children age 6–59 months - Use of treated mosquito nets - Use of antimalarial drugs for treatment of fever among children under age 5

    The information collected through the 2022 GDHS is intended to assist policymakers and programme managers in designing and evaluating programmes and strategies for improving the health of the country’s population.

    Geographic coverage

    National coverage

    Analysis unit

    • Household
    • Individual
    • Children age 0-5
    • Woman age 15-49
    • Man age 15-59

    Universe

    The survey covered all de jure household members (usual residents), all women aged 15-49, men aged 15-59, and all children aged 0-4 resident in the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    To achieve the objectives of the 2022 GDHS, a stratified representative sample of 18,450 households was selected in 618 clusters, which resulted in 15,014 interviewed women age 15–49 and 7,044 interviewed men age 15–59 (in one of every two households selected).

    The sampling frame used for the 2022 GDHS is the updated frame prepared by the GSS based on the 2021 Population and Housing Census.1 The sampling procedure used in the 2022 GDHS was stratified two-stage cluster sampling, designed to yield representative results at the national level, for urban and rural areas, and for each of the country’s 16 regions for most DHS indicators. In the first stage, 618 target clusters were selected from the sampling frame using a probability proportional to size strategy for urban and rural areas in each region. Then the number of targeted clusters were selected with equal probability systematic random sampling of the clusters selected in the first phase for urban and rural areas. In the second stage, after selection of the clusters, a household listing and map updating operation was carried out in all of the selected clusters to develop a list of households for each cluster. This list served as a sampling frame for selection of the household sample. The GSS organized a 5-day training course on listing procedures for listers and mappers with support from ICF. The listers and mappers were organized into 25 teams consisting of one lister and one mapper per team. The teams spent 2 months completing the listing operation. In addition to listing the households, the listers collected the geographical coordinates of each household using GPS dongles provided by ICF and in accordance with the instructions in the DHS listing manual. The household listing was carried out using tablet computers, with software provided by The DHS Program. A fixed number of 30 households in each cluster were randomly selected from the list for interviews.

    For further details on sample design, see APPENDIX A of the final report.

    Mode of data collection

    Face-to-face computer-assisted interviews [capi]

    Research instrument

    Four questionnaires were used in the 2022 GDHS: the Household Questionnaire, the Woman’s Questionnaire, the Man’s Questionnaire, and the Biomarker Questionnaire. The questionnaires, based on The DHS Program’s model questionnaires, were adapted to reflect the population and health issues relevant to Ghana. In addition, a self-administered Fieldworker Questionnaire collected information about the survey’s fieldworkers.

    The GSS organized a questionnaire design workshop with support from ICF and obtained input from government and development partners expected to use the resulting data. The DHS Program optional modules on domestic violence, malaria, and social and behavior change communication were incorporated into the Woman’s Questionnaire. ICF provided technical assistance in adapting the modules to the questionnaires.

    Cleaning operations

    DHS staff installed all central office programmes, data structure checks, secondary editing, and field check tables from 17–20 October 2022. Central office training was implemented using the practice data to test the central office system and field check tables. Seven GSS staff members (four male and three female) were trained on the functionality of the central office menu, including accepting clusters from the field, data editing procedures, and producing reports to monitor fieldwork.

    From 27 February to 17 March, DHS staff visited the Ghana Statistical Service office in Accra to work with the GSS central office staff on finishing the secondary editing and to clean and finalize all data received from the 618 clusters.

    Response rate

    A total of 18,540 households were selected for the GDHS sample, of which 18,065 were found to be occupied. Of the occupied households, 17,933 were successfully interviewed, yielding a response rate of 99%. In the interviewed households, 15,317 women age 15–49 were identified as eligible for individual interviews. Interviews were completed with 15,014 women, yielding a response rate of 98%. In the subsample of households selected for the male survey, 7,263 men age 15–59 were identified as eligible for individual interviews and 7,044 were successfully interviewed.

    Sampling error estimates

    The estimates from a sample survey are affected by two types of errors: (1) nonsampling errors and (2) sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2022 Ghana Demographic and Health Survey (2022 GDHS) to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.

    Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2022 GDHS is only one of many samples that could have been selected from the same population, using the same design and identical size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results. A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95% of all possible samples of identical size and design.

    If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2022 GDHS sample was the result of a multistage stratified design, and, consequently, it was necessary to use more complex formulas. The computer software used to calculate sampling errors for the GDHS 2022 is an SAS program. This program used the Taylor linearization method to estimate variances for survey estimates that are means, proportions, or ratios. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

    A more detailed description of estimates of sampling errors are presented in APPENDIX B of the survey report.

    Data appraisal

    Data Quality Tables

    • Age distribution of eligible and interviewed women
    • Age distribution of eligible and interviewed men
    • Age displacement at age 14/15
    • Age displacement at age 49/50
    • Pregnancy outcomes by years preceding the survey
    • Completeness of reporting
    • Standardisation exercise results from anthropometry training
    • Height and weight data completeness and quality for children
    • Height measurements from random subsample of measured children
    • Interference in height and weight measurements of children
    • Interference in height and weight measurements of women and men
    • Heaping in anthropometric measurements for children (digit preference)
    • Observation of mosquito nets
    • Observation of handwashing facility
    • School attendance by single year of age
    • Vaccination cards photographed
    • Number of
  4. Predict students' dropout and academic success

    • kaggle.com
    zip
    Updated Jan 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Predict students' dropout and academic success [Dataset]. https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention
    Explore at:
    zip(89332 bytes)Available download formats
    Dataset updated
    Jan 3, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Predict students' dropout and academic success

    Investigating the Impact of Social and Economic Factors

    By [source]

    About this dataset

    This dataset provides a comprehensive view of students enrolled in various undergraduate degrees offered at a higher education institution. It includes demographic data, social-economic factors and academic performance information that can be used to analyze the possible predictors of student dropout and academic success. This dataset contains multiple disjoint databases consisting of relevant information available at the time of enrollment, such as application mode, marital status, course chosen and more. Additionally, this data can be used to estimate overall student performance at the end of each semester by assessing curricular units credited/enrolled/evaluated/approved as well as their respective grades. Finally, we have unemployment rate, inflation rate and GDP from the region which can help us further understand how economic factors play into student dropout rates or academic success outcomes. This powerful analysis tool will provide valuable insight into what motivates students to stay in school or abandon their studies for a wide range of disciplines such as agronomy, design, education nursing journalism management social service or technologies

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset can be used to understand and predict student dropouts and academic outcomes. The data includes a variety of demographic, social-economic and academic performance factors related to the students enrolled in higher education institutions. The dataset provides valuable insights into the factors that affect student success and could be used to guide interventions and policies related to student retention.

    Using this dataset, researchers can investigate two key questions: - which specific predictive factors are linked with student dropout or completion? - how do different features interact with each other? For example, researchers could explore if there any demographic characteristics (e.g., gender, age at enrollment etc.) or immersion conditions (e.g., unemployment rate in region) are associated with higher student success rates, as well as understand what implications poverty has for educational outcomes. By answering these questions, research insight is generated which can provide critical information for administrators on formulating strategies that promote successful degree completion among students from diverse backgrounds in their institutions.

    In order to use this dataset effectively it is important that scientists familiarize themselves with all variables provided in the dataset including categorical (qualitative) variables such as gender or application mode; numerical variables such as number of curricular units at the beginning of semesters or age at enrollment; ordinal data measurement type variables such as marital status; studied trends over time such as inflation rate or GDP; frequency measurements variables like percentage of scholarship holders; etc.. Additionally scientists should make sure they aware off all potential bias included in the data prior running analysis–for example understanding if one population is underrepresented compared another -as this phenomenon could lead unexpected results if not taken into consideration while conducting research undertaken using this data set.. Finally it would be important for practitioners realize that this current Kaggle Dataset contains only one semester-worth information on each admission intake whereas additional studies conducted for a longer time period might be able provide more accurate results related selected topic area due further deterioration retention achievement coefficients obtained from those gradually accurate experiments unfolding different year-long admissions seasons

    Research Ideas

    • Prediction of Student Retention: This dataset can be used to develop predictive models that can identify student risk factors for dropout and take early interventions to improve student retention rate.
    • Improved Academic Performance: By using this data, higher education institutions could better understand their students' academic progress and identify areas of improvement from both an individual and institutional perspective. This will enable them to develop targeted courses, activities, or initiatives that enhance academic performance more effectively and efficiently.
    • Accessibility Assistance: Using the demographic information included in the dataset, institutions could develop s...
  5. c

    Multimodal Head and Neck cancer dataset

    • cancerimagingarchive.net
    n/a, svs and png
    Updated Nov 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2025). Multimodal Head and Neck cancer dataset [Dataset]. http://doi.org/10.7937/rcty-5h16
    Explore at:
    svs and png, n/aAvailable download formats
    Dataset updated
    Nov 18, 2025
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Nov 18, 2025
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    Abstract

    HANCOCK is a comprehensive, monocentric dataset of 763 head and neck cancer patients, including diverse data modalities. It contains histopathology imaging (whole-slide images of H&E-stained primary tumors and tissue microarrays with immunohistochemical staining) alongside structured clinical data (demographics, tumor pathology characteristics, laboratory blood measurements) and textual data (de-identified surgery reports and medical histories). All patients were treated curatively, and data span diagnoses from 2005–2019. This multimodal collection enables research into integrative analyses – for example, combining histologic features with clinical parameters for outcome prediction. Early analyses have demonstrated that fusing these modalities improves prognostic modeling compared to single-source data, and that leveraging histology with foundation models can enhance endpoint prediction​. HANCOCK aims to facilitate precision oncology studies by providing a large public resource for developing and benchmarking multimodal machine learning methods in head and neck cancer.

    Introduction

    Head and neck cancer (HNC) is a prevalent malignancy with poor outcomes – it is the 7th most common cancer globally and carries a 5-year survival of only ~25–60% despite modern treatments​. Improving patient prognosis may require personalized, multimodal therapy decisions, using information from pathology, clinical, and other data sources​. However, progress in multimodal prediction has been limited by the lack of large public datasets that integrate these diverse data types​. To our knowledge, existing HNC datasets are either small or incomplete; for example, a radiomics study included 288 oropharyngeal cases​, and a proteomics-focused set with imaging had only 122 cases​. The Cancer Genome Atlas (TCGA) provides multi-omics for >500 HNC cases, but lacks crucial data like pathology reports, blood tests, or comprehensive imaging for each patient​. These limitations hinder robust multimodal research​.

    HANCOCK was created to address this gap​. It aggregates 763 patients’ data from a single academic center, capturing a real-world, uniformly treated cohort. The dataset uniquely combines whole slide histopathology images, tissue microarray images, detailed clinical parameters, pathology reports, and lab values in one resource​​. By curating and harmonizing these modalities, HANCOCK enables researchers to explore complex data interdependencies and develop multimodal predictive models. The patient population reflects typical HNC demographics – 80% male, median age 61, with 72% being former or current smokers​ – aligning with expected epidemiology​ and supporting generalizability. In summary, HANCOCK is an unprecedented multimodal HNC dataset that can fuel research in machine learning, prognostic biomarker discovery, and integrative oncology, ultimately advancing personalized head and neck cancer care.

    Methods

    The following sections describe how the HANCOCK data were collected, processed, and prepared for public sharing.

    Subject Inclusion and Exclusion Criteria

    Patients included in HANCOCK were those diagnosed with head and neck cancer between 2005 and 2019 at University Hospital Erlangen (Germany) who underwent a curative-intent initial treatment (surgery and/or definitive therapy)​. This encompasses cancers of the oral cavity, oropharynx, hypopharynx, and larynx​. Patients treated palliatively or with recurrent/metastatic disease at presentation were excluded to focus on first-course, curative treatments. The cohort consists of 763 patients (approximately 80% male, 20% female) with a median age of 61 years​. Notably, ~72% have a history of tobacco use​, which is consistent with real-world HNC risk factors. The distribution of tumor subsites and stages reflects typical HNC presentation, and thus the dataset is broadly representative of the general HNC patient population​. Being a single-center dataset, there is limited geographic diversity; however, the homogeneous data acquisition and treatment context reduce variability in data quality. No significant selection biases were introduced aside from the exclusion of non-curative cases – all major HNC subsite cases over the inclusion period were captured, providing a comprehensive real-world sample. Ethical approval was obtained for this retrospective data collection and sharing (Ethics Committee vote #23-22-Br), and all data were fully de-identified prior to release.

    Data Acquisition

    Histopathology: Tissue specimens from the primary tumors (and involved lymph nodes, if present) were obtained from the pathology archives. All samples were formalin-fixed and paraffin-embedded (FFPE) and stained with hematoxylin and eosin (H&E) following routine protocols​. Digital whole-slide imaging was performed on these histology slides. A total of 709 H&E slides of primary tumor tissue (701 patients had one slide, 8 patients had two slides) were scanned at high resolution using a 3DHISTECH P1000 scanner at an effective 82.44× magnification (0.1213 µm/pixel). Additionally, 396 H&E slides of lymph node metastases were scanned, using two systems: an Aperio Leica GT450 at 40× (0.2634 µm/pixel) and the 3DHISTECH P1000 at ~51× (0.1945 µm/pixel). (Multiple scanners were utilized over the course of the project; all resulting images were cross-verified for quality.) The digital whole slide images (WSIs) are provided in the pyramidal Aperio SVS format, a TIFF-based format compatible with standard viewers.

    In addition to full slides, tissue microarrays (TMAs) were constructed from each patient’s tumor block to sample important regions. For each case, two cylindrical core biopsies (diameter 1.5 mm) were taken – one from the tumor center and one from the invasive tumor front. These cores were assembled into TMA blocks and stained on separate slides with a panel of eight stains: H&E plus immunohistochemical (IHC) markers targeting various immune cells and tumor biomarkers. The IHC markers include CD3, CD8, CD56, CD68, CD163, PD-L1, and MHC-1, which label T cells (CD3, CD8), natural killer cells (CD56), monocytes/macrophages (CD68, CD163), and a tumor immune checkpoint ligand (PD-L1), as well as MHC class I expression. Each core appears on up to 8 stained TMA slides (one per stain), yielding up to 16 TMA images per patient (two cores × eight stains). In the dataset, TMA images are provided for both the tumor-center and tumor-front cores; these too are digitized high-resolution images (consistent microscope settings, ~40×). The combination of WSIs and TMAs yields a rich imaging dataset: 701 patients have at least one primary tumor WSI (62 patients lack WSIs due to unavailable tissue), and all patients have TMA core images unless the tumor block was exhausted. This imaging data offers both broad tissue context from WSIs and targeted cellular detail from TMAs. Manual tumor region annotations are also included for the primary tumor WSIs (see Data Analysis below).

    Clinical and Pathology Data: A wide array of non-imaging data was extracted from hospital information systems and pathology reports for each patient. Key demographic variables (age, sex, etc.) and tumor pathology details were collected, including primary tumor site, histologic subtype, grade, TNM stage, resection margin status, depth of invasion, perineural and lymphovascular invasion, and nodal metastasis status. These pathology parameters were recorded in a structured format for each case​​. Standard clinical coding systems were used where applicable: e.g., diagnoses are coded with ICD-10 codes and procedures with OPS codes (the German procedure classification system)​. The dataset includes these codes for each patient’s conditions and treatments. Comprehensive laboratory blood test results at diagnosis or pre-treatment were also compiled, covering complete blood counts, coagulation measures, electrolytes, kidney function, C-reactive protein, and other relevant analytes. Reference ranges for each lab parameter are provided alongside the values to indicate whether a result was normal or abnormal. Most patients have a full panel of these lab results, though some values are missing if a test was not clinically indicated; the dataset notes availability per patient. All structured data have been cleaned and validated – for example, harmonizing category values and checking consistency (e.g. TNM stages align with recorded tumor sites).

    Textual Data (Surgical Reports and Histories): Unstructured clinical text was also included to add rich context on treatment details. Surgery reports (operative notes) from the primary tumor resection and associated medical history summaries were retrieved from the hospital’s electronic records. For each patient, the operative report from their first definitive surgery and the corresponding

  6. a

    Graphical library of population dynamics in 104 towns and villages of...

    • arcticdata.io
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lawrence Hamilton (2024). Graphical library of population dynamics in 104 towns and villages of Arctic/Subarctic Alaska, 1990-2022. [Dataset]. http://doi.org/10.18739/A25H7BW29
    Explore at:
    Dataset updated
    Apr 15, 2024
    Dataset provided by
    Arctic Data Center
    Authors
    Lawrence Hamilton
    Time period covered
    Jan 1, 1990 - Jan 1, 2022
    Area covered
    Description

    These files contain individual graphs tracking population dynamics in 104 individual Arctic/Subarctic Alaska communities, over the years from 1990 to 2022. The numerical data underlying these graphs have been archived separately with the Arctic Data Center: Hamilton, L.C. 2023. “Annual population, natural increase and net migration for rural Alaska communities 1990–2022.” Dataset archived with the NSF Arctic Data Center. https://arcticdata.io/catalog/view/doi:10.18739/A28K74Z2B The purpose of this "graphical library" is to provide visualizations of 1990-2022 population change for each town or village in a format that is simple to download, share, and apply to other purposes such as planning, proposals or case studies. The files (identical pdf and PowerPoint versions) include a brief rationale, illustration of the numerical database organization, description of sources, citations and links to published articles, and explanation of the graphical style. These notes are followed by 104 individual graphs, one per slide, organized by boroughs or census areas.

  7. e

    Population at Risk of Malaria - Dataset - ENERGYDATA.INFO

    • energydata.info
    Updated Aug 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Population at Risk of Malaria - Dataset - ENERGYDATA.INFO [Dataset]. https://energydata.info/dataset/population-at-risk-of-malaria
    Explore at:
    Dataset updated
    Aug 27, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Malaria poses a risk to approximately 3.3 billion people or approximately half of the world's population. Most malaria cases occur in Sub-Saharan Africa. Asia, Latin America, and to a lesser extent the Middle East and parts of Europe are also affected. According to the Global Malaria Report published by the World Health Organization (WHO), malaria was present in 106 countries and territories in 2010; and there were 216 million estimated cases of malaria and nearly 0.7 million deaths - mostly among children living in Africa. In this research, we have estimated current population exposed to malaria - by country. In our computation, we have made the geographical distinction of areas with high, medium, low prevalence ("endemicity") of malaria in each country based on the Global malaria atlas compiled by the Malaria Atlas Project (MAP) of the Oxford University. The data are based on 24,492 parasite rate surveys (Plasmodiumfalciparum. 24,178; Plasmodium vivax. 8,866) from an aggregated sample of 4,373,066 slides prepared from blood samples taken in 85 countries. The MAP study employs a new cartographic technique for deriving global clinical burden estimates of Plasmodium falciparum malaria for 2007. These estimates are then compared with those derived under existing surveillance-based approaches to arrive at the final data used in the malaria mapping (Hay et al., 2009). (http://www.map.ox.ac.uk/media/maps/pdf/mean/World_mean.pdf, accessed 2012) Malaria maps generally separate the malaria endemicity into three broad categories by Plasmodium falciparum parasite rate (PfPR), a commonly reported index of malaria transmission intensity: PfPR < 5% as low endemicity, PfPR 5%-40% as medium/intermediate endemicity, and PfPR > 40% as high endemicity. In our research, global mapping techniques were used to estimate population exposed to malaria. The malaria endemicity maps were overlaid on global population maps from Landscan 20051 (Dobson, 2000) and country-level population exposure in the three endemicity areas were computed. Due to the spatial reference of the data and the number of observations in the combined data, the use of Geographic Information Systems functions from ESRI ArcGIS (v 9.3.1) were used and automated in the python (v 2.5) language.

  8. CA Zip Code Boundaries

    • data.ca.gov
    • gis.data.ca.gov
    • +1more
    Updated Apr 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Technology (2025). CA Zip Code Boundaries [Dataset]. https://data.ca.gov/dataset/ca-zip-code-boundaries
    Explore at:
    csv, arcgis geoservices rest api, geojson, gpkg, html, zip, txt, kml, gdb, xlsxAvailable download formats
    Dataset updated
    Apr 16, 2025
    Dataset authored and provided by
    California Department of Technologyhttp://cdt.ca.gov/
    Area covered
    California
    Description
    This feature service is derived from the Esri "United States Zip Code Boundaries" layer, queried to only CA data.


    Published by the California Department of Technology Geographic Information Services Team.
    The GIS Team can be reached at ODSdataservices@state.ca.gov.

    U.S. ZIP Code Boundaries represents five-digit ZIP Code areas used by the U.S. Postal Service to deliver mail more effectively. The first digit of a five-digit ZIP Code divides the United States into 10 large groups of states (or equivalent areas) numbered from 0 in the Northeast to 9 in the far West. Within these areas, each state is divided into an average of 10 smaller geographical areas, identified by the second and third digits. These digits, in conjunction with the first digit, represent a Sectional Center Facility (SCF) or a mail processing facility area. The fourth and fifth digits identify a post office, station, branch or local delivery area.

    As of the time this layer was published, in January 2025, Esri's boundaries are sourced from TomTom (June 2024) and the 2023 population estimates are from Esri Demographics. Esri updates its layer annually and those changes will immediately be reflected in this layer. Note that, because this layer passes through Esri's data, if you want to know the true date of the underlying data, click through to Esri's original source data and look at their metadata for more information on updates.

    Cautions about using Zip Code boundary data
    Zip code boundaries have three characteristics you should be aware of before using them:
    1. Zip code boundaries change, in ways small and large - these are not a stable analysis unit. Data you received keyed to zip codes may have used an earlier and very different boundary for your zip codes of interest.
    2. Historically, the United States Postal Service has not published zip code boundaries, and instead, boundary datasets are compiled by third party vendors from address data. That means that the boundary data are not authoritative, and any data you have keyed to zip codes may use a different, vendor-specific method for generating boundaries from the data here.
    3. Zip codes are designed to optimize mail delivery, not social, environmental, or demographic characteristics. Analysis using zip codes is subject to create issues with the Modifiable Areal Unit Problem that will bias any results because your units of analysis aren't designed for the data being studied.
    As of early 2025, USPS appears to be in the process of releasing boundaries, which will at least provide an authoritative source, but because of the other factors above, we do not recommend these boundaries for many use cases. If you are using these for anything other than mailing purposes, we recommend reconsideration. We provide the boundaries as a convenience, knowing people are looking for them, in order to ensure that up-to-date boundaries are available.
  9. O

    2015 - Survey - National opinion trends

    • opalpro.cs.upb.de
    • data.europa.eu
    Updated Feb 13, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    diceupb (2017). 2015 - Survey - National opinion trends [Dataset]. http://opalpro.cs.upb.de:5000/vi/dataset/2015_-_survey_-_national_opinion_trends
    Explore at:
    http://publications.europa.eu/resource/authority/file-type/pdfAvailable download formats
    Dataset updated
    Feb 13, 2017
    Dataset provided by
    diceupb
    License

    http://publications.europa.eu/resource/authority/licence/COM_REUSEhttp://publications.europa.eu/resource/authority/licence/COM_REUSE

    Description

    A national analysis has been carried out as a follow-up to the exploratory study ‘Major changes in European public opinion with regard to the EU’, which showed how public opinion had changed in the 28 Member States since 1973.

    The new national analysis is made up of three Powerpoint presentations that show how public opinion in each of the Member States has changed since 2007.

    1. The first presentation, ‘national public opinion trends’, analyses how the answers to key Eurobarometer questions changed in each Member State between 2007 and 2015, in particular: The image of the EP, the role of the EP and the membership of the EU.

    2. The second presentation, which also focuses on individual Member States, is devoted to socio demographic trends. It shows the main differences between the EU average and the national results for the key questions referred to above and for others. It breaks trends down by gender, age and socio-professional category.

    3. The third presentation deals more specifically with topics relating to ‘identity and EU citizenship’. The changes in public opinion between 2007 and 2015 are dealt with on a national basis and compared with the European average. On a socio-demographic level, a specific analysis was made of the differences between age groups.

  10. d

    Data from: Contrasting demographic processes underlie uphill shifts in a...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Skikne; Blair McLaughlin; Mark Fisher; David Ackerly; Erika Zavaleta (2024). Contrasting demographic processes underlie uphill shifts in a desert ecosystem [Dataset]. http://doi.org/10.5061/dryad.pk0p2ngz6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    Dryad
    Authors
    Sarah Skikne; Blair McLaughlin; Mark Fisher; David Ackerly; Erika Zavaleta
    Time period covered
    Oct 1, 2024
    Description

    Contrasting demographic processes underlie uphill shifts in a desert ecosystem

    https://doi.org/10.5061/dryad.pk0p2ngz6

    Description of the data and file structure

    Files and variables

    File: individ_plant_outcomes.csv

    Description: Data on demographic outcomes for individual plants extracted from paired historical-modern photos taken along the Deep Canyon Transect in Riverside County, California.

    Variables
    • site: site name. Map of sites can be found in the Supplementary Materials of Skikne et al. 2024. Contrasting demographic processes underlie uphill shifts in a desert ecosystem. Ecology.
    • spp: species
    • extant_t1: whether the plant existed in the historical photo (1) or not (0)
    • alive_t1: whether the plant was alive in the historical photo (1) or not (0)
    • height_t1: height of the plant in the historical photo (unitless, see below)
    • width_t1: width of the plant in the historical photo (unitless, see below)...
  11. f

    Regions of Homozygosity in the Porcine Genome: Consequence of Demography and...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    tiff
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mirte Bosse; Hendrik-Jan Megens; Ole Madsen; Yogesh Paudel; Laurent A. F. Frantz; Lawrence B. Schook; Richard P. M. A. Crooijmans; Martien A. M. Groenen (2023). Regions of Homozygosity in the Porcine Genome: Consequence of Demography and the Recombination Landscape [Dataset]. http://doi.org/10.1371/journal.pgen.1003100
    Explore at:
    tiffAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS Genetics
    Authors
    Mirte Bosse; Hendrik-Jan Megens; Ole Madsen; Yogesh Paudel; Laurent A. F. Frantz; Lawrence B. Schook; Richard P. M. A. Crooijmans; Martien A. M. Groenen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Inbreeding has long been recognized as a primary cause of fitness reduction in both wild and domesticated populations. Consanguineous matings cause inheritance of haplotypes that are identical by descent (IBD) and result in homozygous stretches along the genome of the offspring. Size and position of regions of homozygosity (ROHs) are expected to correlate with genomic features such as GC content and recombination rate, but also direction of selection. Thus, ROHs should be non-randomly distributed across the genome. Therefore, demographic history may not fully predict the effects of inbreeding. The porcine genome has a relatively heterogeneous distribution of recombination rate, making Sus scrofa an excellent model to study the influence of both recombination landscape and demography on genomic variation. This study utilizes next-generation sequencing data for the analysis of genomic ROH patterns, using a comparative sliding window approach. We present an in-depth study of genomic variation based on three different parameters: nucleotide diversity outside ROHs, the number of ROHs in the genome, and the average ROH size. We identified an abundance of ROHs in all genomes of multiple pigs from commercial breeds and wild populations from Eurasia. Size and number of ROHs are in agreement with known demography of the populations, with population bottlenecks highly increasing ROH occurrence. Nucleotide diversity outside ROHs is high in populations derived from a large ancient population, regardless of current population size. In addition, we show an unequal genomic ROH distribution, with strong correlations of ROH size and abundance with recombination rate and GC content. Global gene content does not correlate with ROH frequency, but some ROH hotspots do contain positive selected genes in commercial lines and wild populations. This study highlights the importance of the influence of demography and recombination on homozygosity in the genome to understand the effects of inbreeding.

  12. S2 Table -

    • plos.figshare.com
    • figshare.com
    xlsx
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Begoña Martínez-Cruz; Hanna Zalewska; Andrzej Zalewski (2023). S2 Table - [Dataset]. http://doi.org/10.1371/journal.pone.0266161.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Begoña Martínez-Cruz; Hanna Zalewska; Andrzej Zalewski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sliding-window cohorts for A- microsatellites and B- mtDNA sequences. Samples were subdivided in cohorts of four years. n indicates the number of individuals in each cohort. Only cohorts integrated by eight or more individuals are listed and considered for the analyses (and thus included here). (XLSX)

  13. Survey of Consumer Finances

    • federalreserve.gov
    Updated Oct 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Board of Governors of the Federal Reserve Board (2023). Survey of Consumer Finances [Dataset]. http://doi.org/10.17016/8799
    Explore at:
    Dataset updated
    Oct 18, 2023
    Dataset provided by
    Federal Reserve Board of Governors
    Federal Reserve Systemhttp://www.federalreserve.gov/
    Authors
    Board of Governors of the Federal Reserve Board
    Time period covered
    1962 - 2023
    Description

    The Survey of Consumer Finances (SCF) is normally a triennial cross-sectional survey of U.S. families. The survey data include information on families' balance sheets, pensions, income, and demographic characteristics.

  14. T

    India Population

    • tradingeconomics.com
    • id.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Updated Oct 10, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2012). India Population [Dataset]. https://tradingeconomics.com/india/population
    Explore at:
    json, excel, xml, csvAvailable download formats
    Dataset updated
    Oct 10, 2012
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 31, 1950 - Dec 31, 2024
    Area covered
    India
    Description

    The total population in India was estimated at 1398.6 million people in 2024, according to the latest census figures and projections from Trading Economics. This dataset provides - India Population - actual values, historical data, forecast, chart, statistics, economic calendar and news.

  15. g

    GLA Demography - Comparison of available population estimates

    • gimi9.com
    Updated Apr 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). GLA Demography - Comparison of available population estimates [Dataset]. https://gimi9.com/dataset/london_comparison-of-available-population-estimates/
    Explore at:
    Dataset updated
    Apr 5, 2023
    Description

    At the April 2023 meeting of the Population Statistics User Group, the GLA Demography team presented an overview of currently available sources of population estimates for the previous decade, namely: The original ONS mid-year population estimates (including rolled-forward estimates for 2021) Experimental outputs from the ONS's Dynamic Population Model The modelled population backseries produced by the GLA to act as inputs to our 2021-based interim population projections The slides from the presentation are published here together with packages of comparison plots for all local authority districts and regions in England to allow users to easily view some of the key differences between the sources for their own areas. The plots also include comparisons of the Dynamic Population Model's provisional 2022 estimates of births with the modelled estimates of recent births produced by the GLA.

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
ckan.publishing.service.gov.uk (2020). Adult Social Care slides - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/adult-social-care-slides
Organization logo

Adult Social Care slides - Dataset - data.gov.uk

Explore at:
Dataset updated
Aug 14, 2020
Dataset provided by
CKANhttps://ckan.org/
Description

Unequal impact of COVID-19: BAME disproportionality Section 1 (slides 1 – 3): The Public Health England (PHE) review confirms the risk of mortality as a result of covid-19 by ethnicity. Data on access to care and emergency response has been taken from our local VCS partner feedback and indications from local data.) Data on the care workforce by ethnicity was taken from our local data and the Section 2 (Slides 4 – 7) covers demographic information on Black, Asian, and other or mixed ethnic people delivering direct care in the wider social care sector from the Skills for Care 2019 Social Care Workforce Review (note: factors that need to be considered are age, sex, underlying health conditions, ethnicity, and pregnancy.) Information on Camden’s ASC workforce was taken from GLA 2016-based Ethnic Group Projections - mid-2020). Demographic information on people receiving ASC support in Camden has been taken from our local service data. Section 3: (slides 8-15) sets out information on Adult Social Care activity during Covid-19 and looks at data relative to ethnicity including the ASC cohort of Camden’s shielded residents. (Service held data NOT official statistics including qualitative feedback from communities) Section 4: (Slides 16 – 18) shows information related to the Adult Social Care outcomes framework which has provided some information gathered before Covid-19 on the experiences of people who are BAME and in receipt of social care support in Camden.

Search
Clear search
Close search
Google apps
Main menu