13 datasets found

Distribution of blood types in the U.S. as of 2023
statista.com
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Distribution of blood types in the U.S. as of 2023 [Dataset]. https://www.statista.com/statistics/1112664/blood-type-distribution-us/
Explore at:
Dataset updated
Mar 18, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
The eight main blood types are A+, A-, B+, B-, O+, O-, AB+, and AB-. The most common blood type in the United States is O-positive, with around 38 percent of the population having this type of blood. However, blood type O-positive is more common in Latino-Americans than other ethnicities, with around 53 percent of Latino-Americans with this blood type, compared to 47 percent of African Americans and 37 percent of Caucasians. Blood donation The American Red Cross estimates that every two seconds someone in the United States needs blood or platelets, highlighting the importance of blood donation. It was estimated that in 2021, around 6.5 million people in the U.S. donated blood, with around 1.7 million of these people donating for the first time. Those with blood type O-negative are universal blood donors, meaning their blood can be transfused for any blood type. Therefore, this blood type is the most requested by hospitals. However, only about seven percent of the U.S. population has this blood type. Blood transfusion Blood transfusion is a routine procedure that involves adding donated blood to a patient’s body. There are many reasons why a patient may need a blood transfusion, including surgery, cancer treatment, severe injury, or chronic illness. In 2021, there were around 10.76 million blood transfusions in the United States. Most blood transfusions in the United States occur in an inpatient medicine setting, while critical care accounts for the second highest number of transfusions.
i
N-BGP (Noninvasive Blood Group Prediction Dataset)
ieee-dataport.org
Updated Jul 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prem Verma (2023). N-BGP (Noninvasive Blood Group Prediction Dataset) [Dataset]. https://ieee-dataport.org/documents/n-bgp-noninvasive-blood-group-prediction-dataset
Explore at:
Dataset updated
Jul 5, 2023
Authors
Prem Verma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
platelet
Complete blood count of the household population
www150.statcan.gc.ca
open.canada.ca
+1more
Updated Mar 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2021). Complete blood count of the household population [Dataset]. http://doi.org/10.25318/1310033301-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310033301-eng
Dataset updated
Mar 5, 2021
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Complete blood count of the household population, by sex and age group.
o
Healthcare Dataset
opendatabay.com
.csv
Updated Jun 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Healthcare Dataset [Dataset]. https://www.opendatabay.com/data/dataset/953c80ef-162d-467b-ae1c-867d0f9c490d
Explore at:
.csvAvailable download formats
Dataset updated
Jun 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Healthcare Insurance & Costs
Description
Context: This synthetic healthcare dataset has been created to serve as a valuable resource for data science, machine learning, and data analysis enthusiasts. It is designed to mimic real-world healthcare data, enabling users to practice, develop, and showcase their data manipulation and analysis skills in the context of the healthcare industry.

Inspiration: The inspiration behind this dataset is rooted in the need for practical and diverse healthcare data for educational and research purposes. Healthcare data is often sensitive and subject to privacy regulations, making it challenging to access for learning and experimentation. To address this gap, I have leveraged Python's Faker library to generate a dataset that mirrors the structure and attributes commonly found in healthcare records. By providing this synthetic data, I hope to foster innovation, learning, and knowledge sharing in the healthcare analytics domain.

Dataset Information: Each column provides specific information about the patient, their admission, and the healthcare services provided, making this dataset suitable for various data analysis and modeling tasks in the healthcare domain. Here's a brief explanation of each column in the dataset -

Name: This column represents the name of the patient associated with the healthcare record. Age: The age of the patient at the time of admission, expressed in years. Gender: Indicates the gender of the patient, either "Male" or "Female." Blood Type: The patient's blood type, which can be one of the common blood types (e.g., "A+", "O-", etc.). Medical Condition: This column specifies the primary medical condition or diagnosis associated with the patient, such as "Diabetes," "Hypertension," "Asthma," and more. Date of Admission: The date on which the patient was admitted to the healthcare facility. Doctor: The name of the doctor responsible for the patient's care during their admission. Hospital: Identifies the healthcare facility or hospital where the patient was admitted. Insurance Provider: This column indicates the patient's insurance provider, which can be one of several options, including "Aetna," "Blue Cross," "Cigna," "UnitedHealthcare," and "Medicare." Billing Amount: The amount of money billed for the patient's healthcare services during their admission. This is expressed as a floating-point number. Room Number: The room number where the patient was accommodated during their admission. Admission Type: Specifies the type of admission, which can be "Emergency," "Elective," or "Urgent," reflecting the circumstances of the admission. Discharge Date: The date on which the patient was discharged from the healthcare facility, based on the admission date and a random number of days within a realistic range. Medication: Identifies a medication prescribed or administered to the patient during their admission. Examples include "Aspirin," "Ibuprofen," "Penicillin," "Paracetamol," and "Lipitor." Test Results: Describes the results of a medical test conducted during the patient's admission. Possible values include "Normal," "Abnormal," or "Inconclusive," indicating the outcome of the test. Usage Scenarios: This dataset can be utilized for a wide range of purposes, including:

Developing and testing healthcare predictive models. Practicing data cleaning, transformation, and analysis techniques. Creating data visualizations to gain insights into healthcare trends. Learning and teaching data science and machine learning concepts in a healthcare context. You can treat it as a Multi-Class Classification Problem and solve it for Test Results which contains 3 categories(Normal, Abnormal, and Inconclusive). Acknowledgments: I acknowledge the importance of healthcare data privacy and security and emphasize that this dataset is entirely synthetic. It does not contain any real patient information or violate any privacy regulations. I hope that this dataset contributes to the advancement of data science and healthcare analytics and inspires new ideas. Feel free to explore, analyze, and share your findings with the Kaggle community.

Original Data Source: Healthcare Dataset
R
Blood Cell Detection_new Small Dataset
universe.roboflow.com
zip
Updated Jun 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dewinter (2022). Blood Cell Detection_new Small Dataset [Dataset]. https://universe.roboflow.com/dewinter/blood-cell-detection_new-small-dataset/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Jun 3, 2022
Dataset authored and provided by
Dewinter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
White Blood Cells Bounding Boxes
Description
Here are a few use cases for this project:

Medical Research: The model could be used by biologists and medical researchers to assist in studying blood cells, understanding their distribution and behavior, as well as helping in the development of treatments for blood-related diseases by automating the blood cell detection process.

Disease Diagnosis and Monitoring: Clinicians or healthcare professionals can use the model to analyze patient blood samples to identify abnormalities in white blood cell quantities, helping in diagnosing potential diseases like leukemia, lymphoma, or infections.

Educational Tool: This model can serve as a powerful educational tool for biology students or medical trainees, allowing them to visualize and identify different white blood cell types, thus improving their understanding and recognition skills.

Pharmaceutical Testing: In pharmaceutical research, the model could be applied to assess the impact of various drugs or treatments on white blood cell count and distinguish their effects on different types of white blood cells, a vital step in the pre-clinical and clinical testing phases.

Veterinary Use: It could be employed by veterinarians to analyze blood samples of animals for diagnosing infections, diseases, or understanding the animal's overall immunity status by counting and classifying different types of white blood cells.
d
Data from: Database for Forensic Anthropology in the United States,...
catalog.data.gov
datasets.ai
+1more
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Database for Forensic Anthropology in the United States, 1962-1991 [Dataset]. https://catalog.data.gov/dataset/database-for-forensic-anthropology-in-the-united-states-1962-1991-486d3
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justice
Area covered
United States
Description
This project was undertaken to establish a computerized skeletal database composed of recent forensic cases to represent the present ethnic diversity and demographic structure of the United States population. The intent was to accumulate a forensic skeletal sample large and diverse enough to reflect different socioeconomic groups of the general population from different geographical regions of the country in order to enable researchers to revise the standards being used for forensic skeletal identification. The database is composed of eight data files, comprising four categories. The primary "biographical" or "identification" files (Part 1, Demographic Data, and Part 2, Geographic and Death Data) comprise the first category of information and pertain to the positive identification of each of the 1,514 data records in the database. Information in Part 1 includes sex, ethnic group affiliation, birth date, age at death, height (living and cadaver), and weight (living and cadaver). Variables in Part 2 pertain to the nature of the remains, means and sources of identification, city and state/country born, occupation, date missing/last seen, date of discovery, date of death, time since death, cause of death, manner of death, deposit/exposure of body, area found, city, county, and state/country found, handedness, and blood type. The Medical History File (Part 3) represents the second category of information and contains data on the documented medical history of the individual. Variables in Part 3 include general comments on medical history as well as comments on congenital malformations, dental notes, bone lesions, perimortem trauma, and other comments. The third category consists of an inventory file (Part 4, Skeletal Inventory Data) in which data pertaining to the specific contents of the database are maintained. This includes the inventory of skeletal material by element and side (left and right), indicating the condition of the bone as either partial or complete. The variables in Part 4 provide a skeletal inventory of the cranium, mandible, dentition, and postcranium elements and identify the element as complete, fragmentary, or absent. If absent, four categories record why it is missing. The last part of the database is composed of three skeletal data files, covering quantitative observations of age-related changes in the skeleton (Part 5), cranial measurements (Part 6), and postcranial measurements (Part 7). Variables in Part 5 provide assessments of epiphyseal closure and cranial suture closure (left and right), rib end changes (left and right), Todd Pubic Symphysis, Suchey-Brooks Pubic Symphysis, McKern & Steward--Phases I, II, and III, Gilbert & McKern--Phases I, II, and III, auricular surface, and dorsal pubic pitting (all for left and right). Variables in Part 6 include cranial measurements (length, breadth, height) and mandibular measurements (height, thickness, diameter, breadth, length, and angle) of various skeletal elements. Part 7 provides postcranial measurements (length, diameter, breadth, circumference, and left and right, where appropriate) of the clavicle, scapula, humerus, radius, ulna, scarum, innominate, femur, tibia, fibula, and calcaneus. A small file of noted problems for a few cases is also included (Part 8).
Blood Cancer - Image Dataset
kaggle.com
Updated Jan 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akhil (2023). Blood Cancer - Image Dataset [Dataset]. https://www.kaggle.com/datasets/akhiljethwa/blood-cancer-image-dataset/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 27, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Akhil
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
This dataset contains 10,000 single-cell images (64x64 pixels) taken from peripheral blood smears of patients diagnosed with Acute Myeloid Leukemia (Blood Cancer).

The images were obtained from The Cancer Imaging Archive (TCIA).

Citations:

"Matek, C., Schwarz, S., Marr, C., & Spiekermann, K. (2019). A Single-cell Morphological Dataset of Leukocytes from AML Patients and Non-malignant Controls [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/tcia.2019.36f5o9ld

Matek, C., Schwarz, S., Spiekermann, K. et al. Human-level recognition of blast cells in acute myeloid leukaemia with convolutional neural networks. Nat Mach Intell 1, 538–544 (2019). https://doi.org/10.1038/s42256-019-0101-9

https://www.medanta.org/patient-education-blog/common-types-of-blood-cancer/
G
High blood pressure, by age group and sex, household population aged 12 and...
open.canada.ca
www150.statcan.gc.ca
+1more
csv, html, xml
Updated Jan 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2023). High blood pressure, by age group and sex, household population aged 12 and over, Canada and provinces [Dataset]. https://open.canada.ca/data/en/dataset/8c82c279-ebb1-4517-a898-81a3ac27faeb
Explore at:
csv, xml, htmlAvailable download formats
Dataset updated
Jan 17, 2023
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
This table contains 14784 series, with data for years 1994 - 1998 (not all combinations necessarily have data for all years). This table contains data described by the following dimensions (not all combinations are available): Geography (11 items: Canada; Newfoundland and Labrador; Prince Edward Island; Nova Scotia ...), Age group (14 items: Total; 12 years and over; 12-19 years; 15-19 years; 12-14 years ...), Sex (3 items: Both sexes; Males; Females ...), High blood pressure (4 items: Total population for the variable high blood pressure; Without high blood pressure; High blood pressure; not stated; With high blood pressure ...), Characteristics (8 items: Number of persons; High 95% confidence interval - number of persons; Coefficient of variation for number of persons; Low 95% confidence interval - number of persons ...).
Z
Occurrence of blood feeding terrestrial leeches in a degraded forest...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Swinfield, Tom (2020). Occurrence of blood feeding terrestrial leeches in a degraded forest ecosystem [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2536268
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Deere, Nicolas J
Swinfield, Tom
Drinkwater, Rosie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description: This dataset includes the abundance of two species of terrestrial leech collected at multiple sites at the SAFE project in Sabah, Malaysia. Leech collections took place over two seasons, one in the dry season of 2015 and one in the wet season of 2016. For each of the sites, four repeated visits took place and 20 minute searches were conducted within the boundaries of 25 m2 vegetation plots. As these sites have been subjected to differennt degrees of current and historic degradation, the vegetation structure data is also included for each site. For a subset of the leech sites there is corresponding mammal detection data from camera traps across the landscape, which is also included in this dataset. Project: This dataset was collected as part of the following SAFE research project: The effects of rainforest fragmentation on mammal community assemblages using leech blood-meal analysis Funding: These data were collected as part of research funded by:

NERC (Standard grant , NE/K016148/1) This dataset is released under the CC-BY 4.0 licence, requiring that you cite the dataset in any outputs, but has the additional condition that you acknowledge the contribution of these funders in any outputs.

XML metadata: GEMINI compliant metadata for this dataset is available here Files: This consists of 1 file: Drinkwater2019_leech_occurrence.v2.xlsx Drinkwater2019_leech_occurrence.v2.xlsx This file contains dataset metadata and 4 data tables:

Leech abundance and survey-covariates 2015 (described in worksheet abundance2015) Description: This dataset has the abundance of all the leech individuals of both species collected during surveys in 2015 between February and June. The number of leech collected is split by species of leech and each of the four visits per site. For each survey at a site the associated survey-specific covariates are included. These are the associated effort (number of people collecting the leeches) and the date the visits happened (julian day since the beginning of the year). Number of fields: 17 Number of data rows: 169 Fields:

site: SAFE second order point (Field type: location) visit_B1: Number of brown leeches collected during first visit to each site (Field type: abundance) visit_B2: Number of brown leeches collected during second visit to each site (Field type: abundance) visit_B3: Number of brown leeches collected during third visit to each site (Field type: abundance) visit_B4: Number of brown leeches collected during fourth visit to each site (Field type: abundance) visit_T1: Number of tiger leeches collected during first visit to each site (Field type: abundance) visit_T2: Number of tiger leeches collected during second visit to each site (Field type: abundance) visit_T3: Number of tiger leeches collected during third visit to each site (Field type: abundance) visit_T4: Number of tiger leeches collected during fourth visit to each site (Field type: abundance) eff_1: Number of people collecting leeches per survey as a measure of survey effort for the first visit to each site (Field type: abundance) eff_2: Number of people collecting leeches per survey as a measure of survey effort for the second visit to each site (Field type: abundance) eff_3: Number of people collecting leeches per survey as a measure of survey effort for the third visit to each site (Field type: abundance) eff_4: Number of people collecting leeches per survey as a measure of survey effort for the fourth visit to each site (Field type: abundance) date.1: Julian date of visit 1 (Field type: numeric) date.2: Julian date of visit 2 (Field type: numeric) date.3: Julian date of visit 3 (Field type: numeric) date.4: Julian date of visit 4 (Field type: numeric)

Leech abundance and survey-covariates 2016 (described in worksheet abundance2016) Description: This dataset has the abundance of all the leech individuals of both species collected during surveys in 2016 between September and December. The number of leech collected is split by species of leech and each of the four visits per site. For each survey at a site the associated survey-specific covariates are included. These are the associated effort (number of people collecting the leeches) and the date the visits happened (julian day since the beginning of the year). Number of fields: 17 Number of data rows: 169 Fields:

site: SAFE second order point (Field type: location) visit_B1: Number of brown leeches collected during first visit to each site (Field type: abundance) visit_B2: Number of brown leeches collected during second visit to each site (Field type: abundance) visit_B3: Number of brown leeches collected during third visit to each site (Field type: abundance) visit_B4: Number of brown leeches collected during fourth visit to each site (Field type: abundance) visit_T1: Number of tiger leeches collected during first visit to each site (Field type: abundance) visit_T2: Number of tiger leeches collected during second visit to each site (Field type: abundance) visit_T3: Number of tiger leeches collected during third visit to each site (Field type: abundance) visit_T4: Number of tiger leeches collected during fourth visit to each site (Field type: abundance) eff_1: Number of people collecting leeches per survey as a measure of survey effort for the first visit to each site (Field type: abundance) eff_2: Number of people collecting leeches per survey as a measure of survey effort for the second visit to each site (Field type: abundance) eff_3: Number of people collecting leeches per survey as a measure of survey effort for the third visit to each site (Field type: abundance) eff_4: Number of people collecting leeches per survey as a measure of survey effort for the fourth visit to each site (Field type: abundance) date.1: Julian date of visit 1 (Field type: numeric) date.2: Julian date of visit 2 (Field type: numeric) date.3: Julian date of visit 3 (Field type: numeric) date.4: Julian date of visit 4 (Field type: numeric)

Site specific covariates (described in worksheet covariates) Description: Vegetation structure data associated with each site for which leech surveys were conducted. The metrics include canopy height, moran's I and plant-area-index. These data were extracted from LiDAR data with a 50 m2 buffer around the centroid for each site. Number of fields: 6 Number of data rows: 169 Fields:

site: SAFE second order point code (Field type: location) tch: Top of canopy height per site (Field type: numeric) canopy_height_moran: Habitat heterogeneity - Morans I - per site (Field type: numeric) canopy_height_sd: Standard deviation of canopy height (Field type: numeric) pai_mean: Mean plant area index at site (Field type: numeric) pai_sd: Plant area index standard deviation (Field type: numeric)

Mammal detections (described in worksheet mammals) Description: This dataset contains the mammal detections recorded from camera traps at a subset of the leech survey locations. Sampling effort is also included as a measure of survey effort. Number of fields: 27 Number of data rows: 83 Fields:

Camera: Name of camera (Field type: location) CTNs: Measure of trapping effort - number of nights the cameras were operational (Field type: numeric) Asian Elephant: Count of detections for this taxon (Field type: abundance) Banded Civet: Count of detections for this taxon (Field type: abundance) Banteng: Count of detections for this taxon (Field type: abundance) Bearded Pig: Count of detections for this taxon (Field type: abundance) Bornean Yellow Muntjac: Count of detections for this taxon (Field type: abundance) Common Palm Civet: Count of detections for this taxon (Field type: abundance) Greater Mouse-deer: Count of detections for this taxon (Field type: abundance) Leopard Cat: Count of detections for this taxon (Field type: abundance) Lesser Mouse-deer: Count of detections for this taxon (Field type: abundance) Long-tailed Macaque: Count of detections for this taxon (Field type: abundance) Long-tailed Porcupine: Count of detections for this taxon (Field type: abundance) Malay Civet: Count of detections for this taxon (Field type: abundance) Malay Porcupine: Count of detections for this taxon (Field type: abundance) Marbled Cat: Count of detections for this taxon (Field type: abundance) Masked Palm Civet: Count of detections for this taxon (Field type: abundance) Moonrat: Count of detections for this taxon (Field type: abundance) Mousedeer sp.: Count of detections for this taxon (Field type: abundance) Muntjac sp.: Count of detections for this taxon (Field type: abundance) Orangutan: Count of detections for this taxon (Field type: abundance) Pig-tailed Macaque: Count of detections for this taxon (Field type: abundance) Red Muntjac: Count of detections for this taxon (Field type: abundance) Sambar Deer: Count of detections for this taxon (Field type: abundance) Sun Bear: Count of detections for this taxon (Field type: abundance) Sunda Pangolin: Count of detections for this taxon (Field type: abundance) Thick-spined Porcupine: Count of detections for this taxon (Field type: abundance) Date range: 2015-02-01 to 2016-12-31 Latitudinal extent: 4.5000 to 5.0700 Longitudinal extent: 116.7500 to 117.8200 Taxonomic coverage: All taxon names are validated against the GBIF backbone taxonomy. If a dataset uses a synonym, the accepted usage is shown followed by the dataset usage in brackets. Taxa that cannot be validated, including new species and other unknown taxa, morphospecies, functional groups and taxonomic levels not used in the GBIF backbone are shown in square brackets. - Animalia - - Chordata - - - Mammalia - - - - Rodentia - - - - - Hystricidae - - - - - - Hystrix - - - - - - - Hystrix brachyura - - - - - - - Hystrix crassispinis - - - - - - Trichys
Data from: Sialic acid on avian erythrocytes
catalog.data.gov
data.amerigeoss.org
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2020). Sialic acid on avian erythrocytes [Dataset]. https://catalog.data.gov/dataset/sialic-acid-on-avian-erythrocytes
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Understanding variation in physiological traits across taxa is a central question in evolutionary biology that has wide-ranging implications in biomedicine, disease ecology, and environmental protection. Sialic acid (Sia), and in particular, 5-N-acetylneuraminic acid (Neu5Ac), is chemically bound to galactose and the underlying glycan via α2–3 or α2–6 glycosidic linkage (i.e., Siaα2–3Galactose or Siaα2–6Galactose), conferring two different cell surface structures that affects cell to cell communication and interactions with foreign agents including microparasites and toxins. As an initial step towards understanding variation of Sia across the class Aves, we collected red blood cells (RBCs or erythrocytes) and measured Sia quantity in 76 species and 340 individuals using HPLC-MS/MS and glycosidic linkage type in 24 species and 105 individuals using hemagglutination assay. Although Sia quantity did not, α2–6 glycosidic linkage did exhibit a discernable phylogenetic pattern as evaluated by a phylogenetic signal (λ) value of 0.7. Sia quantity appeared to be higher in after hatch year birds than hatch year birds (P < 0.05); moreover, ~80% of the measured Sia across all individuals or species was expressed by ~20% of the individuals or species. Lastly, as expected, we detected a minimal presence of 5-N-glycolylneuraminic acid in the avian RBCs tested. These data provide novel insights and a large baseline dataset for further study on the variability of Sia in the class Aves which might be useful for understanding Sia dependent processes in birds. This dataset is not publicly accessible because: These data are not EPA-owned. It can be accessed through the following means: The data can be accessed by contacting the corresponding author of the manuscript. The corresponding author is Mark Jankowski (jankowski.mark@epa.gov). Alternatively, the data can be accessed by contacting the principle investigator of the funding entity, Jeanne Fair (jmfair@lanl.gov). Format: As described in the manuscript, several hundred birds were sampled to determine the quantity of sialic acid and the glycosidic linkage type present on the blood cell surface of those sampled birds. The information was used to summarize the variation of sialic acid on avian red blood cells from an individual and a phylogenetic perspective. Therefore, the dataset includes all sample identification information including date and location of sample acquisition, bird taxonomic information, and sialic acid results. Citation information for this dataset can be found in the EDG's Metadata Reference Information section and Data.gov's References section.
Complete Blood Count (CBC)
kaggle.com
Updated Aug 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Noukhez (2024). Complete Blood Count (CBC) [Dataset]. https://www.kaggle.com/datasets/mdnoukhej/complete-blood-count-cbc
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 1, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muhammad Noukhez
Description
Dataset Description:

This dataset is a comprehensive collection of Complete Blood Count (CBC) images, meticulously organized to support machine learning and deep learning projects, especially in the domain of medical image analysis. The dataset's structure ensures a balanced and systematic approach to model development, validation, and testing.

Dataset Breakdown:

Training Images: 300

Validation Images: 60

Test Images: 60

Annotations: Detailed annotations included for all images

Overview:

The Complete Blood Count (CBC) is a crucial test used in medical diagnostics to evaluate the overall health and detect a variety of disorders, including anemia, infection, and many other diseases. This dataset provides a rich source of CBC images that can be used to train machine learning models to automate the analysis and interpretation of these tests.

Data Composition:

Training Set:

Contains 300 images

These images are used to train machine learning models, enabling them to learn and recognize patterns associated with various blood cell types and conditions.

Validation Set:

Contains 60 images

Used to tune the models and optimize their performance, ensuring that the models generalize well to new, unseen data.

Test Set:

Contains 60 images

Used to evaluate the final model performance, providing an unbiased assessment of how well the model performs on new data.

Annotations:

Each image in the dataset is accompanied by detailed annotations, which include information about the different types of blood cells present and any relevant diagnostic features. These annotations are essential for supervised learning, allowing models to learn from labeled examples and improve their accuracy and reliability.

Key Features:

High-Quality Images: All images are of high quality, making them suitable for a variety of machine learning tasks, including image classification, object detection, and segmentation.

Comprehensive Annotations: Each image is thoroughly annotated, providing valuable information that can be used to train and validate models.

Balanced Dataset: The dataset is carefully balanced with distinct sets for training, validation, and testing, ensuring that models trained on this data will be robust and generalizable.

Applications:

This dataset is ideal for researchers and practitioners in the fields of machine learning, deep learning, and medical image analysis. Potential applications include: - Automated CBC Analysis: Developing algorithms to automatically analyze CBC images and provide diagnostic insights. - Blood Cell Classification: Training models to accurately classify different types of blood cells, which is critical for diagnosing various blood disorders. - Educational Purposes: Using the dataset as a teaching tool to help students and new practitioners understand the complexities of CBC image analysis.

Usage Notes:

Data Augmentation: Users may consider applying data augmentation techniques to increase the diversity of the training data and improve model robustness.

Preprocessing: Proper preprocessing, such as normalization and noise reduction, can enhance model performance.

Evaluation Metrics: It is recommended to use standard evaluation metrics such as accuracy, precision, recall, and F1-score to assess model performance.

Conclusion:

This CBC dataset is a valuable resource for anyone looking to advance the field of automated medical diagnostics through machine learning and deep learning. With its high-quality images, detailed annotations, and balanced composition, it provides the necessary foundation for developing accurate and reliable models for CBC analysis.
f
Table_2_Predictors of high SARS-CoV-2 immunoglobulin G titers in COVID-19...
frontiersin.figshare.com
figshare.com
docx
Updated Jun 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jingyun Tang; Humin Liu; Qing Wang; Xiaobo Gu; Jia Wang; Wenjun Li; Yinglan Luo; Yan Li; Lan Deng; Yue Luo; Xinman Du; Donglin Tan; Xuemei Fu; Xue Chen (2023). Table_2_Predictors of high SARS-CoV-2 immunoglobulin G titers in COVID-19 convalescent whole-blood donors: a cross-sectional study in China.docx [Dataset]. http://doi.org/10.3389/fimmu.2023.1191479.s002
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2023.1191479.s002
Dataset updated
Jun 14, 2023
Dataset provided by
Frontiers
Authors
Jingyun Tang; Humin Liu; Qing Wang; Xiaobo Gu; Jia Wang; Wenjun Li; Yinglan Luo; Yan Li; Lan Deng; Yue Luo; Xinman Du; Donglin Tan; Xuemei Fu; Xue Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
BackgroundDemographic information has been shown to help predict high antibody titers of COVID-19 convalescent plasma (CCP) in CCP donors. However, there is no research on the Chinese population and little evidence on whole-blood donors. Therefore, we aimed to investigate these associations among Chinese blood donors after SARS-CoV-2 infection.MethodsIn this cross-sectional study, 5,064 qualified blood donors with confirmed or suspected SARS-CoV-2 infection completed a self-reported questionnaire and underwent tests of SARS-CoV-2 Immunoglobulin G (IgG) antibody and ABO blood type. Logistic regression models were used to calculate odds ratios (ORs) for high SARS-CoV-2 IgG titers according to each factor.ResultsTotally, 1,799 participants (with SARS-CoV-2 IgG titers≥1:160) had high-titer CCPs. Multivariable analysis showed that a 10-year increment in age and earlier donation were associated with higher odds of high-titer CCP, while medical personnel was associated with lower odds. The ORs (95% CIs) of high-titer CCP were 1.17 (1.10–1.23, p< 0.001) and 1.41 (1.25-1.58, p< 0.001) for each 10-year increment in age and earlier donation, respectively. The OR of high-titer CCP was 0.75 (0.60-0.95, p = 0.02) for medical personnel. Female early donors were associated with increased odds of high-titer CCP, but this association was insignificant for later donors. Donating after 8 weeks from the onset was associated with decreased odds of having high-titer CCP compared to donating within 8 weeks from the onset, and the HR was 0.38 (95% CI: 0.22-0.64, p
Additional file 6 of Correction for both common and rare cell types in blood...
figshare.com
springernature.figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damiano Pellegrino-Coppola; Annique Claringbould; Maartje Stutvoet; Dorret I. Boomsma; M. Arfan Ikram; P. Eline Slagboom; Harm-Jan Westra; Lude Franke (2023). Additional file 6 of Correction for both common and rare cell types in blood is important to identify genes that correlate with age [Dataset]. http://doi.org/10.6084/m9.figshare.14220878.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14220878.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Damiano Pellegrino-Coppola; Annique Claringbould; Maartje Stutvoet; Dorret I. Boomsma; M. Arfan Ikram; P. Eline Slagboom; Harm-Jan Westra; Lude Franke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 6: Table S6. Mean squared errors, Pearson correlation coefficient (r) and Spearman correlation coefficient (rho) of residual gene expression with age. LL, LifeLines DEEP; LLS, Leiden Longevity Study; NTR, Netherlands Twin Registry; RS, Rotterdam Study; EM, extended model; IM, initial model; EM-age, extended model without age as covariate; IM-age, initial model without age as covariate.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Distribution of blood types in the U.S. as of 2023 [Dataset]. https://www.statista.com/statistics/1112664/blood-type-distribution-us/

Distribution of blood types in the U.S. as of 2023

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Mar 18, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Area covered

United States

Description

The eight main blood types are A+, A-, B+, B-, O+, O-, AB+, and AB-. The most common blood type in the United States is O-positive, with around 38 percent of the population having this type of blood. However, blood type O-positive is more common in Latino-Americans than other ethnicities, with around 53 percent of Latino-Americans with this blood type, compared to 47 percent of African Americans and 37 percent of Caucasians. Blood donation The American Red Cross estimates that every two seconds someone in the United States needs blood or platelets, highlighting the importance of blood donation. It was estimated that in 2021, around 6.5 million people in the U.S. donated blood, with around 1.7 million of these people donating for the first time. Those with blood type O-negative are universal blood donors, meaning their blood can be transfused for any blood type. Therefore, this blood type is the most requested by hospitals. However, only about seven percent of the U.S. population has this blood type. Blood transfusion Blood transfusion is a routine procedure that involves adding donated blood to a patient’s body. There are many reasons why a patient may need a blood transfusion, including surgery, cancer treatment, severe injury, or chronic illness. In 2021, there were around 10.76 million blood transfusions in the United States. Most blood transfusions in the United States occur in an inpatient medicine setting, while critical care accounts for the second highest number of transfusions.

Clear search

Close search

Google apps

Main menu

Distribution of blood types in the U.S. as of 2023

N-BGP (Noninvasive Blood Group Prediction Dataset)

Complete blood count of the household population

Healthcare Dataset

Blood Cell Detection_new Small Dataset

Data from: Database for Forensic Anthropology in the United States,...

Blood Cancer - Image Dataset

Citations:

High blood pressure, by age group and sex, household population aged 12 and...

Occurrence of blood feeding terrestrial leeches in a degraded forest...

Data from: Sialic acid on avian erythrocytes

Complete Blood Count (CBC)

Dataset Breakdown:

Overview:

Data Composition:

Annotations:

Key Features:

Applications:

Usage Notes:

Conclusion:

Table_2_Predictors of high SARS-CoV-2 immunoglobulin G titers in COVID-19...

Additional file 6 of Correction for both common and rare cell types in blood...

Distribution of blood types in the U.S. as of 2023