From 2020 through 2021, the Assessment Capacities Project (ACAPS) Government Measures dataset tracked government responses to the pandemic over time using information compiled by ACAPS analysts and volunteers from the University of Copenhagen and University of Lund. Information was obtained from public sources, including governments, media, United Nations agencies, and other organizations, and are denoted in the dataset.
Government measures were grouped into five categories:
An ancillary dataset produced by the ACAPS Secondary Impacts Analytical Framework tracked secondary impacts across a wide range of relevant themes, such as economy, health, migration, and education.
The Community Health Resources and Needs Assessment (CHRNA) project is a large-scale health needs assessment in diverse, low-income Asian American communities in New York City. The project uses a community-engaged and community venue-based approach to assess existing health issues, available resources, and best approaches to meet community health needs. Questions asked in the CHRNAs assess various determinants of health, including length of residence in the United States, English language proficiency, educational attainment, employment and income, perceived health, health insurance and access to care, nutrition and physical activity, mental health, screening for cancer and other chronic diseases, sleep deprivation, and connections to social and religious environments.
The first round of CHRNAs, conducted between 2004 and 2006, surveyed approximately 100 individuals were surveyed from each of the following Asian subgroups: Cambodians, Chinese, Filipinos, Japanese, Koreans, South Asians, Thai, and Vietnamese (n=1,201).
This data contains recordings from the rat frontal cortex (brain regions mPFC, OFC, ACC, and M2) during wake-sleep episodes where at least 7 minutes of wakefulness are followed by 20 minutes of sleep. The data was recorded using silicon probe electrodes in the frontal cortices of male Long Evans rats between 4-7 months of age during no specific behavior, task or stimulus. The animal was left alone in its home cage.
Data recorded includes both local field potentials (LFP) and spikes. There are 28 recording sessions with 11 total animals: 1360 total units recorded, 1121 units determined stable, 995 putative excitatory units, and 126 putative inhibitory units. On average, each recording session includes two wake-sleep cycles (7 minutes of wakefulness followed by 20 minutes of sleep) episodes.
This dataset was collected for a combined retrospective and prospective cross-sectional study to establish risk factors for infection after treatment for intracerebral hemorrhage and subarachnoid hemorrhage and to determine the impact of those infections on long-term outcomes. Data was harvested from Tisch Hospital records from January 2013 to December 2014 retrospectively and from January 2015 to the present prospectively, and the study aims to recruit an additional 1,000 patients by 2027.
Patients are included in the study if they are over 18 years of age and have a new diagnosis of intracerebral hemorrhage or subarachnoid hemorrhage requiring admission to or consultation by acute neurology faculty members at NYU Langone Medical Center, and for prospective patients, if the patient or next of kin consent to participate in follow-up phone interviews at 3 months and 12 months.
Data that will be collected from both retrospectively and prospectively enrolled patients include:
Data that has been collected from prospectively enrolled patients only includes:
With an emphasis on reaching historically underrepresented populations, the All of Us Research Program recruits adults aged 18 and above across the United States to share their health data to enable new insights into human health and research on precision medicine. Participants contribute electronic health records (EHR), survey responses, biospecimens, wearable devices (biometrics), and physical measurements.
The six All of Us surveys assess the areas listed below:
There are currently three tiers of data access.
The Gallup U.S. Daily Tracking poll was conducted between 2008 and 2017 to collect Americans' opinions and perceptions on political and economic current events. It included two parallel surveys, the U.S. Daily and the Gallup-Sharecare Well-Being Index. Gallup interviews approximately 1,000 U.S. adults every day, half of whom respond to the U.S. Daily survey and the other half respond to the Gallup-Sharecare Well-Being Index survey. The U.S. Daily survey includes information about political affiliation, presidential approval ratings, economic confidence, and religion. The Gallup-Sharecare Well-Being Index includes information on health insurance, exercise, dietary choices, and overall well-being.
The National Center for Advancing Translational Sciences (NCATS) has systematically compiled clinical, laboratory and diagnostic data from electronic health records to support COVID-19 research efforts via the National COVID Cohort Collaborative (N3C) Data Enclave. As of August 2, 2022, the repository contains information from over 15 million patients (including 5.8 million COVID-19 positive patients) across the United States.
The N3C Data Enclave is organized into 3 levels of data with varying access restrictions:
The dataset includes the names, employee sizes, asset sizes, business credit score, owner information, address, longitude and latitude, and census tract information for all businesses in the United States from 1997 to 2022. For nursing homes and hospitals, the dataset also categorizes capacity by the number of beds.
The New York City Community Health Survey (CHS) is a telephone survey conducted annually by the DOHMH, Division of Epidemiology, Bureau of Epidemiology Services. CHS provides robust data on the health of New Yorkers, including neighborhood, borough, and citywide estimates on a broad range of chronic diseases and behavioral risk factors. The data are analyzed and disseminated to influence health program decisions, and increase the understanding of the relationship between health behavior and health status. For more information see EpiQuery, https://a816-health.nyc.gov/hdi/epiquery/visualizations?PageType=ps&PopulationSource=CHS
This dataset includes information from surveys about quality improvement administered to newly licensed registered nurses who participate in the RN Work Project. The purpose of this study was to describe what newly licensed registered nurses working in hospitals learned about quality improvement in their education programs and workplaces.
Quality improvement topics covered by the survey include patient-centered care; evidence-based practice; standardized practices for restraint and seclusion, infection control and pain management; use of information technology or strategies to reduce reliance on memory; participation in analyzing errors and designing system improvements; use of national patient safety resources, initiatives or regulations; and use of specific quality improvement models, specifically:
Cochlear implant use require neuroplasticity within the central auditory system. Despite extensive studies on how cochlear implants activate the auditory system, understanding of cochlear implant-related neuroplasticity remains unknown. This study investigated behavioral responses and neural activity in locus coeruleus and auditory cortex of deafened rats fitted with multi-channel cochlear implants. A total of 59 adult female Long Evans rats were used for these studies: 16 TH-cre rats for training, 4 other TH-cre rats for photometry with the cochlear implant, 2 other TH-cre rats for photometry in normal-hearing rats, 10 other TH-cre rats for optogenetics, 14 rats used for measuring auditory brainstem responses (ABRs) or electrical ABRs (EABRs) pre- and post-deafening (11 wild-type and 3 TH-cre rats), 4 untrained wild-type rats for multi-unit recordings, 5 rats for in vivo whole-cell recording (2 untrained and 3 trained; 2 of the trained rats were TH-cre), and 4 untrained wild-type rats for cochleogram analysis. The dataset contains behavioral, fiber photometry, immunohistochemistry, and electrophysiology data.
This dataset was generated through a two-phase mixed-methods study designed to evaluate the patient, provider, and clinical-level factors related to race and healthcare quality that impact medication adherence among Black and White patients with hypertension. The study population includes 104 hypertensive black and white patients receiving care at the Bellevue Ambulatory Care Practice. The dataset includes audio taped primary care visits and quantitative information about medication adherence that was continuously collected for 3 months after the initial visit using an electronic drug monitoring device (EMD).
The Methods in Infant/Toddler Neuroimaging (MINI) study surveyed investigators at infant and toddler research laboratories across the globe to collect information on preferred practices for MRI acquisition, scan success rates, visit preparation, scanning protocols, family accommodations, study design, and addressing incidental findings.
The survey was distributed in February 2021 through the Fetal, Infant, and Toddler Neuroimaging Group (FIT’NG) listserv and a total of 62 investigators from 38 institutions responded. The 80 survey items contained questions about infant/toddler MRI data acquisition, COVID precautions, and demographics of the respondent. Branching logic directed respondents to questions that were relevant to their expertise: newborns (birth-2 months), infants (3 months-1.5 years), or toddlers (1.5–4 years). A range of 37 to 54 responses were provided per question.
The Clinical Proteomic Tumor Analysis Consortium (CPTAC) analyzes cancer biospecimens by mass spectrometry, characterizing and quantifying their constituent proteins, or proteome. Proteomic analysis for each CPTAC study is carried out independently by Proteomic Characterization Centers (PCCs) using a variety of protein fractionation techniques, instrumentation, and workflows. Mass spectrometry and related data files are organized into datasets by study, sub-proteome, and analysis site.
This deidentified imaging dataset is comprised of raw k-space data in several sub-dataset groups. Raw and DICOM data have been deidentified via conversion to the vendor-neutral ISMRMRD format and the RSNA Clinical Trial Processor, respectively. Manual inspection of each DICOM image was also performed to check for the presence of any unexpected protected health information (PHI), with spot checking of both metadata and image content.
Knee MRI: Data from more than 1,500 fully sampled knee MRIs obtained on 3 and 1.5 Tesla magnets and DICOM images from 10,000 clinical knee MRIs also obtained at 3 or 1.5 Tesla. The raw dataset includes coronal proton density-weighted images with and without fat suppression. The DICOM dataset contains coronal proton density-weighted with and without fat suppression, axial proton density-weighted with fat suppression, sagittal proton density, and sagittal T2-weighted with fat suppression.
Brain MRI: Data from 6,970 fully sampled brain MRIs obtained on 3 and 1.5 Tesla magnets. The raw dataset includes axial T1 weighted, T2 weighted and FLAIR images. Some of the T1 weighted acquisitions included admissions of contrast agent.
Additional information on file structure, data loader, and transforms are available on GitHub.
Prostate MRI: Data obtained on 3 Tesla magnets from 312 male patients referred for clinical prostate MRI exams. The raw dataset includes axial T2-weighted and axial diffusion-weighted images for each of the 312 exams.
The dataset includes the names, employee sizes, asset sizes, business credit score, owner information, address, longitude and latitude, and census tract information for all businesses in New York City from 2010 to 2014. For nursing homes and hospitals, the dataset also categorizes capacity by the number of beds.
Oxytocin is a neuropeptide released in the central nervous system, which is involved in a wide range of behaviors including reproduction, parental care, and pair bonding. In addition, peripheral oxytocin is important for milk ejection during nursing and uterine contractions during labor. Oxytocin is produced mainly in the paraventricular nucleus (PVN) and supraoptic nucleus of the hypothalamus. This study describes a neural circuit routing auditory information about infant vocalizations to mouse oxytocin neurons. They performed in vivo electrophysiological recordings and photometry from identified oxytocin neurons in awake maternal mice presented with pup calls. For this study, they made cell-attached and whole-cell recordings from PVN oxytocin neurons and other optically-unresponsive PVN neurons in awake head-fixed mouse dams. The dataset contains electrophysiology, behavioral, histology, and fiber photometry data. This study reveals a circuit that unlocks central oxytocin release and maternal behavior in response to pup calls.
NYUTron is a large language model-based system that was developed with the objective of integrating clinical workflows centered around structured and unstructured notes and placing electronic orders in real time. The development team queried electronic health records from all NYU Langone facilities to generate two types of datasets: pre-training datasets ("NYU Notes", "NYU Notes–Manhattan", "NYU Notes–Brooklyn") which contain a total of 10 years of unlabelled inpatient clinical notes (387,144 patients, 4.1 billion words) and five fine-tuning datasets ("NYU Readmission", "NYU Readmission–Manhattan", "NYU Readmission–Brooklyn", "NYU Mortality", "NYU Binned LOS", "NYU Insurance Denial", "NYU Binned Comorbidity"), each containing 1 to 10 years of inpatient clinical notes (55,791 to 413,845 patients, 51 to 87 million words) with task-specific labels (2 to 4 classes). In addition, the team utilized two publicly available datasets, i2b2-2012 and MIMIC-III, for testing and fine-tuning.
To assess the model's predictive capabilities, NYUTron was applied to a battery of five tasks: three clinical and two operational tasks (30-day all-cause readmission prediction, in-hospital mortality prediction, comorbidity index prediction, length of stay (LOS) prediction and insurance denial prediction). In addition, a detailed analysis of our 30-day readmission task was performed to investigate data efficiency, generalizability, deployability, and potential clinical impact. NYUTron demonstrated an area under the curve (AUC) of 78.7–94.9%, with an improvement of 5.36–14.7% compared with traditional models.
The investigators have shared code to replicate the pretraining, fine-tuning and testing of the predictive models obtained with NYU Langone electronic health records, as well as preprocessing code for the i2b2-2012 dataset and implementation steps for MIMIC-III.
The Centers for Disease Control and Prevention (CDC) has released deidentified line-listed datasets based on COVID-19 cases reported to CDC. Data suppression is performed on low frequency records (<5), indirect identifiers, and uncommon combinations of demographic characteristics.
The public-use dataset includes 12 data elements assessing demographic characteristics, testing and reporting dates, health status, comorbidities, and disease outcomes. The restricted-access dataset contains an additional 20 elements (32 in total), including state and county of residence information, details on the delivery of care, and symptoms experienced.
Datasets were previously updated on a monthly basis. Although the data remain publicly available, reporting of new data was discontinued as of July 1, 2024.
INSIGHT Clinical Research Network is a project founded by the Patient Centered Outcomes Research Institute (PCORI) and is part of the PCORnet program. The INSIGHT longitudinal datasets bring together New York City organizations including medical schools, medical centers, research support organizations, and practice-based research networks. Over 160 million patient encounters and 365 million diagnoses have been recorded in the central data repository. The INSIGHT datasets include longitudinally collected clinical, patient-reported, and patient-generated information, as well as claims data. Datasets are available at the de-identified patient level, identifiable patient level, and patient cohort level. COVID-19 data is available.
From 2020 through 2021, the Assessment Capacities Project (ACAPS) Government Measures dataset tracked government responses to the pandemic over time using information compiled by ACAPS analysts and volunteers from the University of Copenhagen and University of Lund. Information was obtained from public sources, including governments, media, United Nations agencies, and other organizations, and are denoted in the dataset.
Government measures were grouped into five categories:
An ancillary dataset produced by the ACAPS Secondary Impacts Analytical Framework tracked secondary impacts across a wide range of relevant themes, such as economy, health, migration, and education.