15 datasets found

Adult Social Care slides - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Aug 14, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2020). Adult Social Care slides - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/adult-social-care-slides
Explore at:
Dataset updated
Aug 14, 2020
Dataset provided by
CKANhttps://ckan.org/
Description
Unequal impact of COVID-19: BAME disproportionality Section 1 (slides 1 – 3): The Public Health England (PHE) review confirms the risk of mortality as a result of covid-19 by ethnicity. Data on access to care and emergency response has been taken from our local VCS partner feedback and indications from local data.) Data on the care workforce by ethnicity was taken from our local data and the Section 2 (Slides 4 – 7) covers demographic information on Black, Asian, and other or mixed ethnic people delivering direct care in the wider social care sector from the Skills for Care 2019 Social Care Workforce Review (note: factors that need to be considered are age, sex, underlying health conditions, ethnicity, and pregnancy.) Information on Camden’s ASC workforce was taken from GLA 2016-based Ethnic Group Projections - mid-2020). Demographic information on people receiving ASC support in Camden has been taken from our local service data. Section 3: (slides 8-15) sets out information on Adult Social Care activity during Covid-19 and looks at data relative to ethnicity including the ASC cohort of Camden’s shielded residents. (Service held data NOT official statistics including qualitative feedback from communities) Section 4: (Slides 16 – 18) shows information related to the Adult Social Care outcomes framework which has provided some information gathered before Covid-19 on the experiences of people who are BAME and in receipt of social care support in Camden.
c
Recurrent Breast Cancer: Histopathological and Hyperspectral Images Database...
cancerimagingarchive.net
Updated Sep 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2025). Recurrent Breast Cancer: Histopathological and Hyperspectral Images Database [Dataset]. http://doi.org/10.7937/6kpy-yt49
Explore at:
envi, mrxs, and geojson, xlsx, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/6kpy-yt49
Dataset updated
Sep 30, 2025
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Sep 30, 2025
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Abstract
Multimodal data has emerged as a promising tool to integrate diverse information, offering a more comprehensive perspective. This study introduces the HistologyHSI-BC-Recurrence Database, the first publicly accessible multimodal dataset designed to advance distant recurrence prediction in breast cancer (BC). The dataset comprises 47 histopathological whole-slide images (WSIs), 677 hyperspectral (HS) images, and demographic and clinical data from 47 BC patients, of whom 22 (47%) experienced distant recurrence over a 12-year follow-up. Histopathological slides were digitized using a WSI scanner and annotated by expert pathologists, while HS images were acquired with a bright-field microscope and a HS camera. This dataset provides a promising resource for BC recurrence prediction and personalized treatment strategies by integrating histopathological WSIs, HS images, and demographic and clinical data.
Introduction
Breast cancer (BC) is the most common cancer in women and a leading cause of cancer-related deaths, with metastasis being the main cause of death. About one-third of BC patients develop metastasis, which can be regional or distant, and survival rates drop dramatically with distant metastasis. Despite progress in identifying biomarkers associated with metastasis, there is no consensus for their clinical use. Imaging methods, such as X-ray, ultrasound, and magnetic resonance imaging, play a key role in detection, but histopathological diagnosis is crucial for treatment decisions. Digital pathology, utilizing whole-slide images (WSIs) and machine learning, is transforming BC diagnostics, integrating clinical data to improve prognostic accuracy. Hyperspectral imaging (HSI), which combines spatial and spectral information, is emerging as a promising tool for BC detection and prognosis. However, high-quality datasets integrating WSIs, HS images, and clinical data are scarce. This study introduces the HistologyHSI-BC-Recurrence Database, which includes WSIs, HS images, and clinical data from 47 BC patients, aiming to predict recurrence due to distant metastasis. This multimodal dataset will help develop predictive models, enhance diagnostic accuracy, and support research in computational pathology, ultimately improving personalized treatment strategies for BC.
Methods
Subject Inclusion and Exclusion Criteria
This dataset includes data from 47 patients diagnosed with invasive ductal carcinoma (IDC) between 2006 and 2015. Of these, 22 patients experienced recurrence due to distant metastasis within 12 years, while 25 patients did not. Inclusion criteria required a diagnosis of IDC, representative surgical biopsy, complete clinical and pathological data, and patient consent. Exclusion criteria involved receiving neoadjuvant treatment, regional recurrence rather than in distant organs, presence of distant metastases at diagnosis, or failure to meet inclusion criteria.
Data Acquisition
Histopathology WSIs
Paraffin blocks of primary tumor biopsies with sufficient representative IDC tissue were obtained from the Biobank IISPV-Node Tortosa, Tarragona, Spain. The samples were processed in the Pathology Department, where 2 µm-thick sections were prepared from each paraffin block and stained according to the standard H&E staining protocol. The slides were sealed with coverslips using dibutylphthalate polystyrene xylene (DPX) mounting medium for subsequent digitization and HS microscopic image acquisition. The H&E-stained slides were digitized with a WSI scanner (Pannoramic 250 Flash III, 3DHISTECH Ltd., Budapest, Hungary) at 20× magnification (0.2433 µm/pixel) using MRXS image format.
Demographic and clinical data
The data process involved extracting information from clinical records, including demographic and clinical information (please refer to the HistologyHSI-BC-Recurrence-Clinical-Standardized-DataDictionary.xlsx)
HS images
The HS images were captured using a Hyperspec® VNIR A-Series pushbroom camera, which scans samples spatially and captures spectral data across 400-1,000 nm. The camera is paired with an Olympus BX-53 microscope and a scanning stage that ensures precise sample alignment. Calibration of the HS images is crucial to adjust for sensor response, light transmission, and source variation, achieved by normalizing pixel values using white and dark references. The system also generates synthetic RGB images for easier visualization of the data. In-house software facilitates sample navigation, synchronizes camera and microscope stage, and processes the data by removing noisy bands and generating calibrated cubes.
Data Analysis
WSIs were visualized using QuPath and anonymized with SlideMaster software. The quality of the histopathological slides was verified by pathologists, ensuring no artifacts were present due to tissue preparation or digitization. Pathologists manually annotated the images to differentiate between IDC, healthy tissue, and ductal carcinoma in situ (DCIS) using a color scheme (blue for IDC, green for healthy tissue, and red for DCIS). Annotations were initially made by one pathologist and then validated through a pairwise review with a second pathologist to ensure consistency and minimize inter-observer variability. Furthermore, regions of interest (ROIs) within these tissue types were identified and marked by yellow lines, for further HS imaging analysis.
Usage Notes
Data organization and naming conventions
The database is divided into three main components:
Clinical and demographic data (HistologyHSI-BC-Recurrence -Clinical-Standardized.xlsx)
WSIs and corresponding tissue and ROI annotations (01_01_Histological_Images, 01_02_Tissue_Annotations, and 01_03_HSI_ROI_Annotations)
HSI images (02_01_HSI_Images). The HSI data is stored in folders named according to the regular expression HSI_VNIR_{P}_{TT}_x10_C{CN}, where {P} represents the patient ID, {TT} indicates the tissue type (IDC, healthy, or DCIS), and {CN} is the capture number.
Working with HSIs
HSI data is typically stored in specialized formats like .hdr files paired with .dat or .raw files, representing a multidimensional data cube. Python and MATLAB are usually employed for processing these data. See the External Resources section below for example code. First, calibration is essential, followed by optional processing like spectral dimensionality reduction to reduce noise and computational costs (e.g., reducing 826 spectral bands to 275 by averaging neighboring bands). Normalization can also be performed when needed, scaling data to a range or adjusting to have a mean of 0 and standard deviation of 1. Additionally, removing the sample background, typically the white areas, is recommended for more accurate analysis.
Recommendations for software that can be used to open the data
Visualizing histopathology WSIs
The authors suggest using QuPath software to open and analyze WSIs (MRXS format) and annotations (GeoJSON format). WSIs can be loaded via drag and drop or through the "File/Open" option. Annotations for tissue compartments (IDC, healthy, DCIS) and ROIs (yellow rectangles for HS capture) should be imported as GeoJSON files.
Demographic and Health Survey 2022 - Ghana
microdata.worldbank.org
catalog.ihsn.org
+1more
Updated Jan 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghana Statistical Service (GSS) (2024). Demographic and Health Survey 2022 - Ghana [Dataset]. https://microdata.worldbank.org/index.php/catalog/6122
Explore at:
Dataset updated
Jan 19, 2024
Dataset provided by
Ghana Statistical Services
Authors
Ghana Statistical Service (GSS)
Time period covered
2022 - 2023
Area covered
Ghana
Description
Abstract

The 2022 Ghana Demographic and Health Survey (2022 GDHS) is the seventh in the series of DHS surveys conducted by the Ghana Statistical Service (GSS) in collaboration with the Ministry of Health/Ghana Health Service (MoH/GHS) and other stakeholders, with funding from the United States Agency for International Development (USAID) and other partners.

The primary objective of the 2022 GDHS is to provide up-to-date estimates of basic demographic and health indicators. Specifically, the GDHS collected information on: - Fertility levels and preferences, contraceptive use, antenatal and delivery care, maternal and child health, childhood mortality, childhood immunisation, breastfeeding and young child feeding practices, women’s dietary diversity, violence against women, gender, nutritional status of adults and children, awareness regarding HIV/AIDS and other sexually transmitted infections, tobacco use, and other indicators relevant for the Sustainable Development Goals - Haemoglobin levels of women and children - Prevalence of malaria parasitaemia (rapid diagnostic testing and thick slides for malaria parasitaemia in the field and microscopy in the lab) among children age 6–59 months - Use of treated mosquito nets - Use of antimalarial drugs for treatment of fever among children under age 5

The information collected through the 2022 GDHS is intended to assist policymakers and programme managers in designing and evaluating programmes and strategies for improving the health of the country’s population.

Geographic coverage

National coverage

Analysis unit

Household

Individual

Children age 0-5

Woman age 15-49

Man age 15-59

Universe

The survey covered all de jure household members (usual residents), all women aged 15-49, men aged 15-59, and all children aged 0-4 resident in the household.

Kind of data

Sample survey data [ssd]

Sampling procedure

To achieve the objectives of the 2022 GDHS, a stratified representative sample of 18,450 households was selected in 618 clusters, which resulted in 15,014 interviewed women age 15–49 and 7,044 interviewed men age 15–59 (in one of every two households selected).

The sampling frame used for the 2022 GDHS is the updated frame prepared by the GSS based on the 2021 Population and Housing Census.1 The sampling procedure used in the 2022 GDHS was stratified two-stage cluster sampling, designed to yield representative results at the national level, for urban and rural areas, and for each of the country’s 16 regions for most DHS indicators. In the first stage, 618 target clusters were selected from the sampling frame using a probability proportional to size strategy for urban and rural areas in each region. Then the number of targeted clusters were selected with equal probability systematic random sampling of the clusters selected in the first phase for urban and rural areas. In the second stage, after selection of the clusters, a household listing and map updating operation was carried out in all of the selected clusters to develop a list of households for each cluster. This list served as a sampling frame for selection of the household sample. The GSS organized a 5-day training course on listing procedures for listers and mappers with support from ICF. The listers and mappers were organized into 25 teams consisting of one lister and one mapper per team. The teams spent 2 months completing the listing operation. In addition to listing the households, the listers collected the geographical coordinates of each household using GPS dongles provided by ICF and in accordance with the instructions in the DHS listing manual. The household listing was carried out using tablet computers, with software provided by The DHS Program. A fixed number of 30 households in each cluster were randomly selected from the list for interviews.

For further details on sample design, see APPENDIX A of the final report.

Mode of data collection

Face-to-face computer-assisted interviews [capi]

Research instrument

Four questionnaires were used in the 2022 GDHS: the Household Questionnaire, the Woman’s Questionnaire, the Man’s Questionnaire, and the Biomarker Questionnaire. The questionnaires, based on The DHS Program’s model questionnaires, were adapted to reflect the population and health issues relevant to Ghana. In addition, a self-administered Fieldworker Questionnaire collected information about the survey’s fieldworkers.

The GSS organized a questionnaire design workshop with support from ICF and obtained input from government and development partners expected to use the resulting data. The DHS Program optional modules on domestic violence, malaria, and social and behavior change communication were incorporated into the Woman’s Questionnaire. ICF provided technical assistance in adapting the modules to the questionnaires.

Cleaning operations

DHS staff installed all central office programmes, data structure checks, secondary editing, and field check tables from 17–20 October 2022. Central office training was implemented using the practice data to test the central office system and field check tables. Seven GSS staff members (four male and three female) were trained on the functionality of the central office menu, including accepting clusters from the field, data editing procedures, and producing reports to monitor fieldwork.

From 27 February to 17 March, DHS staff visited the Ghana Statistical Service office in Accra to work with the GSS central office staff on finishing the secondary editing and to clean and finalize all data received from the 618 clusters.

Response rate

A total of 18,540 households were selected for the GDHS sample, of which 18,065 were found to be occupied. Of the occupied households, 17,933 were successfully interviewed, yielding a response rate of 99%. In the interviewed households, 15,317 women age 15–49 were identified as eligible for individual interviews. Interviews were completed with 15,014 women, yielding a response rate of 98%. In the subsample of households selected for the male survey, 7,263 men age 15–59 were identified as eligible for individual interviews and 7,044 were successfully interviewed.

Sampling error estimates

The estimates from a sample survey are affected by two types of errors: (1) nonsampling errors and (2) sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2022 Ghana Demographic and Health Survey (2022 GDHS) to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.

Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2022 GDHS is only one of many samples that could have been selected from the same population, using the same design and identical size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results. A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95% of all possible samples of identical size and design.

If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2022 GDHS sample was the result of a multistage stratified design, and, consequently, it was necessary to use more complex formulas. The computer software used to calculate sampling errors for the GDHS 2022 is an SAS program. This program used the Taylor linearization method to estimate variances for survey estimates that are means, proportions, or ratios. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.

A more detailed description of estimates of sampling errors are presented in APPENDIX B of the survey report.

Data appraisal

Data Quality Tables

Age distribution of eligible and interviewed women

Age distribution of eligible and interviewed men

Age displacement at age 14/15

Age displacement at age 49/50

Pregnancy outcomes by years preceding the survey

Completeness of reporting

Standardisation exercise results from anthropometry training

Height and weight data completeness and quality for children

Height measurements from random subsample of measured children

Interference in height and weight measurements of children

Interference in height and weight measurements of women and men

Heaping in anthropometric measurements for children (digit preference)

Observation of mosquito nets

Observation of handwashing facility

School attendance by single year of age

Vaccination cards photographed

Number of
Predict students' dropout and academic success
kaggle.com
zip
Updated Jan 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Predict students' dropout and academic success [Dataset]. https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention
Explore at:
zip(89332 bytes)Available download formats
Dataset updated
Jan 3, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Predict students' dropout and academic success

Investigating the Impact of Social and Economic Factors

By [source]

About this dataset

This dataset provides a comprehensive view of students enrolled in various undergraduate degrees offered at a higher education institution. It includes demographic data, social-economic factors and academic performance information that can be used to analyze the possible predictors of student dropout and academic success. This dataset contains multiple disjoint databases consisting of relevant information available at the time of enrollment, such as application mode, marital status, course chosen and more. Additionally, this data can be used to estimate overall student performance at the end of each semester by assessing curricular units credited/enrolled/evaluated/approved as well as their respective grades. Finally, we have unemployment rate, inflation rate and GDP from the region which can help us further understand how economic factors play into student dropout rates or academic success outcomes. This powerful analysis tool will provide valuable insight into what motivates students to stay in school or abandon their studies for a wide range of disciplines such as agronomy, design, education nursing journalism management social service or technologies

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset can be used to understand and predict student dropouts and academic outcomes. The data includes a variety of demographic, social-economic and academic performance factors related to the students enrolled in higher education institutions. The dataset provides valuable insights into the factors that affect student success and could be used to guide interventions and policies related to student retention.

Using this dataset, researchers can investigate two key questions: - which specific predictive factors are linked with student dropout or completion? - how do different features interact with each other? For example, researchers could explore if there any demographic characteristics (e.g., gender, age at enrollment etc.) or immersion conditions (e.g., unemployment rate in region) are associated with higher student success rates, as well as understand what implications poverty has for educational outcomes. By answering these questions, research insight is generated which can provide critical information for administrators on formulating strategies that promote successful degree completion among students from diverse backgrounds in their institutions.

In order to use this dataset effectively it is important that scientists familiarize themselves with all variables provided in the dataset including categorical (qualitative) variables such as gender or application mode; numerical variables such as number of curricular units at the beginning of semesters or age at enrollment; ordinal data measurement type variables such as marital status; studied trends over time such as inflation rate or GDP; frequency measurements variables like percentage of scholarship holders; etc.. Additionally scientists should make sure they aware off all potential bias included in the data prior running analysis–for example understanding if one population is underrepresented compared another -as this phenomenon could lead unexpected results if not taken into consideration while conducting research undertaken using this data set.. Finally it would be important for practitioners realize that this current Kaggle Dataset contains only one semester-worth information on each admission intake whereas additional studies conducted for a longer time period might be able provide more accurate results related selected topic area due further deterioration retention achievement coefficients obtained from those gradually accurate experiments unfolding different year-long admissions seasons

Research Ideas

Prediction of Student Retention: This dataset can be used to develop predictive models that can identify student risk factors for dropout and take early interventions to improve student retention rate.

Improved Academic Performance: By using this data, higher education institutions could better understand their students' academic progress and identify areas of improvement from both an individual and institutional perspective. This will enable them to develop targeted courses, activities, or initiatives that enhance academic performance more effectively and efficiently.

Accessibility Assistance: Using the demographic information included in the dataset, institutions could develop s...
c
Multimodal Head and Neck cancer dataset
cancerimagingarchive.net
n/a, svs and png
Updated Nov 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2025). Multimodal Head and Neck cancer dataset [Dataset]. http://doi.org/10.7937/rcty-5h16
Explore at:
svs and png, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/rcty-5h16
Dataset updated
Nov 18, 2025
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Nov 18, 2025
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Abstract
HANCOCK is a comprehensive, monocentric dataset of 763 head and neck cancer patients, including diverse data modalities. It contains histopathology imaging (whole-slide images of H&E-stained primary tumors and tissue microarrays with immunohistochemical staining) alongside structured clinical data (demographics, tumor pathology characteristics, laboratory blood measurements) and textual data (de-identified surgery reports and medical histories). All patients were treated curatively, and data span diagnoses from 2005–2019. This multimodal collection enables research into integrative analyses – for example, combining histologic features with clinical parameters for outcome prediction. Early analyses have demonstrated that fusing these modalities improves prognostic modeling compared to single-source data, and that leveraging histology with foundation models can enhance endpoint prediction. HANCOCK aims to facilitate precision oncology studies by providing a large public resource for developing and benchmarking multimodal machine learning methods in head and neck cancer.
Introduction
Head and neck cancer (HNC) is a prevalent malignancy with poor outcomes – it is the 7th most common cancer globally and carries a 5-year survival of only ~25–60% despite modern treatments. Improving patient prognosis may require personalized, multimodal therapy decisions, using information from pathology, clinical, and other data sources. However, progress in multimodal prediction has been limited by the lack of large public datasets that integrate these diverse data types. To our knowledge, existing HNC datasets are either small or incomplete; for example, a radiomics study included 288 oropharyngeal cases, and a proteomics-focused set with imaging had only 122 cases. The Cancer Genome Atlas (TCGA) provides multi-omics for >500 HNC cases, but lacks crucial data like pathology reports, blood tests, or comprehensive imaging for each patient. These limitations hinder robust multimodal research.
HANCOCK was created to address this gap. It aggregates 763 patients’ data from a single academic center, capturing a real-world, uniformly treated cohort. The dataset uniquely combines whole slide histopathology images, tissue microarray images, detailed clinical parameters, pathology reports, and lab values in one resource. By curating and harmonizing these modalities, HANCOCK enables researchers to explore complex data interdependencies and develop multimodal predictive models. The patient population reflects typical HNC demographics – 80% male, median age 61, with 72% being former or current smokers – aligning with expected epidemiology and supporting generalizability. In summary, HANCOCK is an unprecedented multimodal HNC dataset that can fuel research in machine learning, prognostic biomarker discovery, and integrative oncology, ultimately advancing personalized head and neck cancer care.
Methods
The following sections describe how the HANCOCK data were collected, processed, and prepared for public sharing.
Subject Inclusion and Exclusion Criteria
Patients included in HANCOCK were those diagnosed with head and neck cancer between 2005 and 2019 at University Hospital Erlangen (Germany) who underwent a curative-intent initial treatment (surgery and/or definitive therapy). This encompasses cancers of the oral cavity, oropharynx, hypopharynx, and larynx. Patients treated palliatively or with recurrent/metastatic disease at presentation were excluded to focus on first-course, curative treatments. The cohort consists of 763 patients (approximately 80% male, 20% female) with a median age of 61 years. Notably, ~72% have a history of tobacco use, which is consistent with real-world HNC risk factors. The distribution of tumor subsites and stages reflects typical HNC presentation, and thus the dataset is broadly representative of the general HNC patient population. Being a single-center dataset, there is limited geographic diversity; however, the homogeneous data acquisition and treatment context reduce variability in data quality. No significant selection biases were introduced aside from the exclusion of non-curative cases – all major HNC subsite cases over the inclusion period were captured, providing a comprehensive real-world sample. Ethical approval was obtained for this retrospective data collection and sharing (Ethics Committee vote #23-22-Br), and all data were fully de-identified prior to release.
Data Acquisition
Histopathology: Tissue specimens from the primary tumors (and involved lymph nodes, if present) were obtained from the pathology archives. All samples were formalin-fixed and paraffin-embedded (FFPE) and stained with hematoxylin and eosin (H&E) following routine protocols. Digital whole-slide imaging was performed on these histology slides. A total of 709 H&E slides of primary tumor tissue (701 patients had one slide, 8 patients had two slides) were scanned at high resolution using a 3DHISTECH P1000 scanner at an effective 82.44× magnification (0.1213 µm/pixel). Additionally, 396 H&E slides of lymph node metastases were scanned, using two systems: an Aperio Leica GT450 at 40× (0.2634 µm/pixel) and the 3DHISTECH P1000 at ~51× (0.1945 µm/pixel). (Multiple scanners were utilized over the course of the project; all resulting images were cross-verified for quality.) The digital whole slide images (WSIs) are provided in the pyramidal Aperio SVS format, a TIFF-based format compatible with standard viewers.
In addition to full slides, tissue microarrays (TMAs) were constructed from each patient’s tumor block to sample important regions. For each case, two cylindrical core biopsies (diameter 1.5 mm) were taken – one from the tumor center and one from the invasive tumor front. These cores were assembled into TMA blocks and stained on separate slides with a panel of eight stains: H&E plus immunohistochemical (IHC) markers targeting various immune cells and tumor biomarkers. The IHC markers include CD3, CD8, CD56, CD68, CD163, PD-L1, and MHC-1, which label T cells (CD3, CD8), natural killer cells (CD56), monocytes/macrophages (CD68, CD163), and a tumor immune checkpoint ligand (PD-L1), as well as MHC class I expression. Each core appears on up to 8 stained TMA slides (one per stain), yielding up to 16 TMA images per patient (two cores × eight stains). In the dataset, TMA images are provided for both the tumor-center and tumor-front cores; these too are digitized high-resolution images (consistent microscope settings, ~40×). The combination of WSIs and TMAs yields a rich imaging dataset: 701 patients have at least one primary tumor WSI (62 patients lack WSIs due to unavailable tissue), and all patients have TMA core images unless the tumor block was exhausted. This imaging data offers both broad tissue context from WSIs and targeted cellular detail from TMAs. Manual tumor region annotations are also included for the primary tumor WSIs (see Data Analysis below).
Clinical and Pathology Data: A wide array of non-imaging data was extracted from hospital information systems and pathology reports for each patient. Key demographic variables (age, sex, etc.) and tumor pathology details were collected, including primary tumor site, histologic subtype, grade, TNM stage, resection margin status, depth of invasion, perineural and lymphovascular invasion, and nodal metastasis status. These pathology parameters were recorded in a structured format for each case. Standard clinical coding systems were used where applicable: e.g., diagnoses are coded with ICD-10 codes and procedures with OPS codes (the German procedure classification system). The dataset includes these codes for each patient’s conditions and treatments. Comprehensive laboratory blood test results at diagnosis or pre-treatment were also compiled, covering complete blood counts, coagulation measures, electrolytes, kidney function, C-reactive protein, and other relevant analytes. Reference ranges for each lab parameter are provided alongside the values to indicate whether a result was normal or abnormal. Most patients have a full panel of these lab results, though some values are missing if a test was not clinically indicated; the dataset notes availability per patient. All structured data have been cleaned and validated – for example, harmonizing category values and checking consistency (e.g. TNM stages align with recorded tumor sites).
Textual Data (Surgical Reports and Histories): Unstructured clinical text was also included to add rich context on treatment details. Surgery reports (operative notes) from the primary tumor resection and associated medical history summaries were retrieved from the hospital’s electronic records. For each patient, the operative report from their first definitive surgery and the corresponding
a
Graphical library of population dynamics in 104 towns and villages of...
arcticdata.io
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lawrence Hamilton (2024). Graphical library of population dynamics in 104 towns and villages of Arctic/Subarctic Alaska, 1990-2022. [Dataset]. http://doi.org/10.18739/A25H7BW29
Explore at:
Unique identifier
https://doi.org/10.18739/A25H7BW29
Dataset updated
Apr 15, 2024
Dataset provided by
Arctic Data Center
Authors
Lawrence Hamilton
Time period covered
Jan 1, 1990 - Jan 1, 2022
Area covered

Description
These files contain individual graphs tracking population dynamics in 104 individual Arctic/Subarctic Alaska communities, over the years from 1990 to 2022. The numerical data underlying these graphs have been archived separately with the Arctic Data Center: Hamilton, L.C. 2023. “Annual population, natural increase and net migration for rural Alaska communities 1990–2022.” Dataset archived with the NSF Arctic Data Center. https://arcticdata.io/catalog/view/doi:10.18739/A28K74Z2B The purpose of this "graphical library" is to provide visualizations of 1990-2022 population change for each town or village in a format that is simple to download, share, and apply to other purposes such as planning, proposals or case studies. The files (identical pdf and PowerPoint versions) include a brief rationale, illustration of the numerical database organization, description of sources, citations and links to published articles, and explanation of the graphical style. These notes are followed by 104 individual graphs, one per slide, organized by boroughs or census areas.
e
Population at Risk of Malaria - Dataset - ENERGYDATA.INFO
energydata.info
Updated Aug 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Population at Risk of Malaria - Dataset - ENERGYDATA.INFO [Dataset]. https://energydata.info/dataset/population-at-risk-of-malaria
Explore at:
Dataset updated
Aug 27, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Malaria poses a risk to approximately 3.3 billion people or approximately half of the world's population. Most malaria cases occur in Sub-Saharan Africa. Asia, Latin America, and to a lesser extent the Middle East and parts of Europe are also affected. According to the Global Malaria Report published by the World Health Organization (WHO), malaria was present in 106 countries and territories in 2010; and there were 216 million estimated cases of malaria and nearly 0.7 million deaths - mostly among children living in Africa. In this research, we have estimated current population exposed to malaria - by country. In our computation, we have made the geographical distinction of areas with high, medium, low prevalence ("endemicity") of malaria in each country based on the Global malaria atlas compiled by the Malaria Atlas Project (MAP) of the Oxford University. The data are based on 24,492 parasite rate surveys (Plasmodiumfalciparum. 24,178; Plasmodium vivax. 8,866) from an aggregated sample of 4,373,066 slides prepared from blood samples taken in 85 countries. The MAP study employs a new cartographic technique for deriving global clinical burden estimates of Plasmodium falciparum malaria for 2007. These estimates are then compared with those derived under existing surveillance-based approaches to arrive at the final data used in the malaria mapping (Hay et al., 2009). (http://www.map.ox.ac.uk/media/maps/pdf/mean/World_mean.pdf, accessed 2012) Malaria maps generally separate the malaria endemicity into three broad categories by Plasmodium falciparum parasite rate (PfPR), a commonly reported index of malaria transmission intensity: PfPR < 5% as low endemicity, PfPR 5%-40% as medium/intermediate endemicity, and PfPR > 40% as high endemicity. In our research, global mapping techniques were used to estimate population exposed to malaria. The malaria endemicity maps were overlaid on global population maps from Landscan 20051 (Dobson, 2000) and country-level population exposure in the three endemicity areas were computed. Due to the spatial reference of the data and the number of observations in the combined data, the use of Geographic Information Systems functions from ESRI ArcGIS (v 9.3.1) were used and automated in the python (v 2.5) language.
CA Zip Code Boundaries
data.ca.gov
gis.data.ca.gov
+1more
Updated Apr 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Technology (2025). CA Zip Code Boundaries [Dataset]. https://data.ca.gov/dataset/ca-zip-code-boundaries
Explore at:
csv, arcgis geoservices rest api, geojson, gpkg, html, zip, txt, kml, gdb, xlsxAvailable download formats
Dataset updated
Apr 16, 2025
Dataset authored and provided by
California Department of Technologyhttp://cdt.ca.gov/
Area covered
California
Description
This feature service is derived from the Esri "United States Zip Code Boundaries" layer, queried to only CA data.

For the original data see: https://esri.maps.arcgis.com/home/item.html?id=5f31109b46d541da86119bd4cf213848

Published by the California Department of Technology Geographic Information Services Team.
The GIS Team can be reached at ODSdataservices@state.ca.gov.

U.S. ZIP Code Boundaries represents five-digit ZIP Code areas used by the U.S. Postal Service to deliver mail more effectively. The first digit of a five-digit ZIP Code divides the United States into 10 large groups of states (or equivalent areas) numbered from 0 in the Northeast to 9 in the far West. Within these areas, each state is divided into an average of 10 smaller geographical areas, identified by the second and third digits. These digits, in conjunction with the first digit, represent a Sectional Center Facility (SCF) or a mail processing facility area. The fourth and fifth digits identify a post office, station, branch or local delivery area.

As of the time this layer was published, in January 2025, Esri's boundaries are sourced from TomTom (June 2024) and the 2023 population estimates are from Esri Demographics. Esri updates its layer annually and those changes will immediately be reflected in this layer. Note that, because this layer passes through Esri's data, if you want to know the true date of the underlying data, click through to Esri's original source data and look at their metadata for more information on updates.

Cautions about using Zip Code boundary data
Zip code boundaries have three characteristics you should be aware of before using them:
Zip code boundaries change, in ways small and large - these are not a stable analysis unit. Data you received keyed to zip codes may have used an earlier and very different boundary for your zip codes of interest.
Historically, the United States Postal Service has not published zip code boundaries, and instead, boundary datasets are compiled by third party vendors from address data. That means that the boundary data are not authoritative, and any data you have keyed to zip codes may use a different, vendor-specific method for generating boundaries from the data here.
Zip codes are designed to optimize mail delivery, not social, environmental, or demographic characteristics. Analysis using zip codes is subject to create issues with the Modifiable Areal Unit Problem that will bias any results because your units of analysis aren't designed for the data being studied.
As of early 2025, USPS appears to be in the process of releasing boundaries, which will at least provide an authoritative source, but because of the other factors above, we do not recommend these boundaries for many use cases. If you are using these for anything other than mailing purposes, we recommend reconsideration. We provide the boundaries as a convenience, knowing people are looking for them, in order to ensure that up-to-date boundaries are available.
O
2015 - Survey - National opinion trends
opalpro.cs.upb.de
data.europa.eu
Updated Feb 13, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
diceupb (2017). 2015 - Survey - National opinion trends [Dataset]. http://opalpro.cs.upb.de:5000/vi/dataset/2015_-_survey_-_national_opinion_trends
Explore at:
http://publications.europa.eu/resource/authority/file-type/pdfAvailable download formats
Dataset updated
Feb 13, 2017
Dataset provided by
diceupb
License
http://publications.europa.eu/resource/authority/licence/COM_REUSEhttp://publications.europa.eu/resource/authority/licence/COM_REUSE
Description
A national analysis has been carried out as a follow-up to the exploratory study ‘Major changes in European public opinion with regard to the EU’, which showed how public opinion had changed in the 28 Member States since 1973.

The new national analysis is made up of three Powerpoint presentations that show how public opinion in each of the Member States has changed since 2007.

The first presentation, ‘national public opinion trends’, analyses how the answers to key Eurobarometer questions changed in each Member State between 2007 and 2015, in particular: The image of the EP, the role of the EP and the membership of the EU.

The second presentation, which also focuses on individual Member States, is devoted to socio demographic trends. It shows the main differences between the EU average and the national results for the key questions referred to above and for others. It breaks trends down by gender, age and socio-professional category.

The third presentation deals more specifically with topics relating to ‘identity and EU citizenship’. The changes in public opinion between 2007 and 2015 are dealt with on a national basis and compared with the European average. On a socio-demographic level, a specific analysis was made of the differences between age groups.
d
Data from: Contrasting demographic processes underlie uphill shifts in a...
datadryad.org
data.niaid.nih.gov
zip
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Skikne; Blair McLaughlin; Mark Fisher; David Ackerly; Erika Zavaleta (2024). Contrasting demographic processes underlie uphill shifts in a desert ecosystem [Dataset]. http://doi.org/10.5061/dryad.pk0p2ngz6
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.pk0p2ngz6
Dataset updated
Oct 21, 2024
Dataset provided by
Dryad
Authors
Sarah Skikne; Blair McLaughlin; Mark Fisher; David Ackerly; Erika Zavaleta
Time period covered
Oct 1, 2024
Description
Contrasting demographic processes underlie uphill shifts in a desert ecosystem

https://doi.org/10.5061/dryad.pk0p2ngz6

Description of the data and file structure

Files and variables

File: individ_plant_outcomes.csv

Description: Data on demographic outcomes for individual plants extracted from paired historical-modern photos taken along the Deep Canyon Transect in Riverside County, California.

Variables

site: site name. Map of sites can be found in the Supplementary Materials of Skikne et al. 2024. Contrasting demographic processes underlie uphill shifts in a desert ecosystem. Ecology.

spp: species

extant_t1: whether the plant existed in the historical photo (1) or not (0)

alive_t1: whether the plant was alive in the historical photo (1) or not (0)

height_t1: height of the plant in the historical photo (unitless, see below)

width_t1: width of the plant in the historical photo (unitless, see below)...
f
Regions of Homozygosity in the Porcine Genome: Consequence of Demography and...
figshare.com
datasetcatalog.nlm.nih.gov
tiff
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirte Bosse; Hendrik-Jan Megens; Ole Madsen; Yogesh Paudel; Laurent A. F. Frantz; Lawrence B. Schook; Richard P. M. A. Crooijmans; Martien A. M. Groenen (2023). Regions of Homozygosity in the Porcine Genome: Consequence of Demography and the Recombination Landscape [Dataset]. http://doi.org/10.1371/journal.pgen.1003100
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1003100
Dataset updated
May 31, 2023
Dataset provided by
PLOS Genetics
Authors
Mirte Bosse; Hendrik-Jan Megens; Ole Madsen; Yogesh Paudel; Laurent A. F. Frantz; Lawrence B. Schook; Richard P. M. A. Crooijmans; Martien A. M. Groenen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Inbreeding has long been recognized as a primary cause of fitness reduction in both wild and domesticated populations. Consanguineous matings cause inheritance of haplotypes that are identical by descent (IBD) and result in homozygous stretches along the genome of the offspring. Size and position of regions of homozygosity (ROHs) are expected to correlate with genomic features such as GC content and recombination rate, but also direction of selection. Thus, ROHs should be non-randomly distributed across the genome. Therefore, demographic history may not fully predict the effects of inbreeding. The porcine genome has a relatively heterogeneous distribution of recombination rate, making Sus scrofa an excellent model to study the influence of both recombination landscape and demography on genomic variation. This study utilizes next-generation sequencing data for the analysis of genomic ROH patterns, using a comparative sliding window approach. We present an in-depth study of genomic variation based on three different parameters: nucleotide diversity outside ROHs, the number of ROHs in the genome, and the average ROH size. We identified an abundance of ROHs in all genomes of multiple pigs from commercial breeds and wild populations from Eurasia. Size and number of ROHs are in agreement with known demography of the populations, with population bottlenecks highly increasing ROH occurrence. Nucleotide diversity outside ROHs is high in populations derived from a large ancient population, regardless of current population size. In addition, we show an unequal genomic ROH distribution, with strong correlations of ROH size and abundance with recombination rate and GC content. Global gene content does not correlate with ROH frequency, but some ROH hotspots do contain positive selected genes in commercial lines and wild populations. This study highlights the importance of the influence of demography and recombination on homozygosity in the genome to understand the effects of inbreeding.
S2 Table -
plos.figshare.com
figshare.com
xlsx
Updated Jun 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Begoña Martínez-Cruz; Hanna Zalewska; Andrzej Zalewski (2023). S2 Table - [Dataset]. http://doi.org/10.1371/journal.pone.0266161.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0266161.s003
Dataset updated
Jun 16, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Begoña Martínez-Cruz; Hanna Zalewska; Andrzej Zalewski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sliding-window cohorts for A- microsatellites and B- mtDNA sequences. Samples were subdivided in cohorts of four years. n indicates the number of individuals in each cohort. Only cohorts integrated by eight or more individuals are listed and considered for the analyses (and thus included here). (XLSX)
Survey of Consumer Finances
federalreserve.gov
Updated Oct 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Board of Governors of the Federal Reserve Board (2023). Survey of Consumer Finances [Dataset]. http://doi.org/10.17016/8799
Explore at:
Unique identifier
https://doi.org/10.17016/8799
Dataset updated
Oct 18, 2023
Dataset provided by
Federal Reserve Board of Governors
Federal Reserve Systemhttp://www.federalreserve.gov/
Authors
Board of Governors of the Federal Reserve Board
Time period covered
1962 - 2023
Description
The Survey of Consumer Finances (SCF) is normally a triennial cross-sectional survey of U.S. families. The survey data include information on families' balance sheets, pensions, income, and demographic characteristics.
T
India Population
tradingeconomics.com
id.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Oct 10, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2012). India Population [Dataset]. https://tradingeconomics.com/india/population
Explore at:
json, excel, xml, csvAvailable download formats
Dataset updated
Oct 10, 2012
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 31, 1950 - Dec 31, 2024
Area covered
India
Description
The total population in India was estimated at 1398.6 million people in 2024, according to the latest census figures and projections from Trading Economics. This dataset provides - India Population - actual values, historical data, forecast, chart, statistics, economic calendar and news.
g
GLA Demography - Comparison of available population estimates
gimi9.com
Updated Apr 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). GLA Demography - Comparison of available population estimates [Dataset]. https://gimi9.com/dataset/london_comparison-of-available-population-estimates/
Explore at:
Dataset updated
Apr 5, 2023
Description
At the April 2023 meeting of the Population Statistics User Group, the GLA Demography team presented an overview of currently available sources of population estimates for the previous decade, namely: The original ONS mid-year population estimates (including rolled-forward estimates for 2021) Experimental outputs from the ONS's Dynamic Population Model The modelled population backseries produced by the GLA to act as inputs to our 2021-based interim population projections The slides from the presentation are published here together with packages of comparison plots for all local authority districts and regions in England to allow users to easily view some of the key differences between the sources for their own areas. The plots also include comparisons of the Dynamic Population Model's provisional 2022 estimates of births with the modelled estimates of recent births produced by the GLA.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

ckan.publishing.service.gov.uk (2020). Adult Social Care slides - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/adult-social-care-slides

Adult Social Care slides - Dataset - data.gov.uk

Explore at:

Dataset updated

Aug 14, 2020

Dataset provided by

CKANhttps://ckan.org/

Description

Unequal impact of COVID-19: BAME disproportionality Section 1 (slides 1 – 3): The Public Health England (PHE) review confirms the risk of mortality as a result of covid-19 by ethnicity. Data on access to care and emergency response has been taken from our local VCS partner feedback and indications from local data.) Data on the care workforce by ethnicity was taken from our local data and the Section 2 (Slides 4 – 7) covers demographic information on Black, Asian, and other or mixed ethnic people delivering direct care in the wider social care sector from the Skills for Care 2019 Social Care Workforce Review (note: factors that need to be considered are age, sex, underlying health conditions, ethnicity, and pregnancy.) Information on Camden’s ASC workforce was taken from GLA 2016-based Ethnic Group Projections - mid-2020). Demographic information on people receiving ASC support in Camden has been taken from our local service data. Section 3: (slides 8-15) sets out information on Adult Social Care activity during Covid-19 and looks at data relative to ethnicity including the ASC cohort of Camden’s shielded residents. (Service held data NOT official statistics including qualitative feedback from communities) Section 4: (Slides 16 – 18) shows information related to the Adult Social Care outcomes framework which has provided some information gathered before Covid-19 on the experiences of people who are BAME and in receipt of social care support in Camden.

Clear search

Close search

Google apps

Main menu

Adult Social Care slides - Dataset - data.gov.uk

Recurrent Breast Cancer: Histopathological and Hyperspectral Images Database...

Abstract

Introduction

Methods

Subject Inclusion and Exclusion Criteria

Data Acquisition

Histopathology WSIs

Demographic and clinical data

HS images

Data Analysis

Usage Notes

Data organization and naming conventions

Working with HSIs

Recommendations for software that can be used to open the data

Visualizing histopathology WSIs

Demographic and Health Survey 2022 - Ghana

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Data appraisal

Predict students' dropout and academic success

Predict students' dropout and academic success

Investigating the Impact of Social and Economic Factors

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Multimodal Head and Neck cancer dataset

Abstract

Introduction

Methods

Subject Inclusion and Exclusion Criteria

Data Acquisition

Graphical library of population dynamics in 104 towns and villages of...

Population at Risk of Malaria - Dataset - ENERGYDATA.INFO

CA Zip Code Boundaries

2015 - Survey - National opinion trends

Data from: Contrasting demographic processes underlie uphill shifts in a...

Contrasting demographic processes underlie uphill shifts in a desert ecosystem

Description of the data and file structure

Files and variables

File: individ_plant_outcomes.csv

Variables

Regions of Homozygosity in the Porcine Genome: Consequence of Demography and...

S2 Table -

Survey of Consumer Finances

India Population

GLA Demography - Comparison of available population estimates

Adult Social Care slides - Dataset - data.gov.ukSee More Versions

Adult Social Care slides - Dataset - data.gov.uk