WAVES is a pediatric physiological waveform dataset containing ECG, respiratory, plethysmogram, arterial blood pressure, and a variety of other high-frequency waveforms extracted from bedside monitors.
For code examples and documentation, please refer to the WAVES utilities python package or its associated source code.
Waveform data is stored in compressed and base-64 encoded .csv files that cannot be properly loaded and decompressed using standard csv libraries. The utility codebase provides data loaders to interface with the raw data, and usage examples like basic plotting.
As an unrestricted preview of the dataset, the WAVES utilities code includes a very small sample dataset .csv file in the format that would be provided if you extract/filter download a waveform dataset .csv file from Redivis. The "Supporting files" section of the WAVES dataset on Redivis also includes a larger subset of ~25 samples running for roughly 8 hours each.
BY DOWNLOADING THE SAMPLE DATA FILE, YOU ARE AGREEING TO THE TERMS OF THE PROVIDED DATA USE AGREEMENT (DUA)
Initial release of WAVES to validate and document user access
The data comprises all the nationwide parcel shapes, owner information, address, and any columns related to ownership or public/private land designations, added during the term of the license agreement. The attributes available may vary from parcel to parcel. Data are updated monthly and made available in GeoJSON format.
Refer to LoveLand’s FAQs on Parcel Data for more information on the data, how it is collected, standardized, etc.
Parcel data are made available as tables for each state and the District of Columbia.
Table names use the abbreviation for each state followed by the month and year of the update.
Individual parcel data files are available in four formats: GeoJSON, Geopackage, Geodatabase and Shapefile. Instructions for accessing data can be found here:
https://code.stanford.edu/sul-socialsciences/landgrid-parcel-data#access
https://aimistanford-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/viewhttps://aimistanford-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/view
Synthesizing information from various data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes. INSPECT contains data from 19,438 patients, including CT images, sections of radiology reports, and structured electronic health record (EHR) data (including demographics, diagnoses, procedures, and vitals). Using our provided dataset, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and fused models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best our knowledge, INSPECT is the largest multimodal dataset for enabling reproducible research on strategies for integrating 3D medical imaging and EHR data. NOTE: this is the first part of release due to PHI review. This release has 20078 CT scans, 21,266 impression sections and the EHR modality data will be uploaded to Stanford Redivis website (https://redivis.com/Stanford)
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Introduction: Through funding agency and publisher policies, an increasing proportion of the health sciences literature is being made open access. Such an increase in access raises questions about the awareness and potential utilization of this literature by those working in health fields. Methods: A sample of physicians (N=336) and public health non-governmental organization (NGO) staff (N=92) were provided with relatively complete access to the research literature indexed in PubMed, as well as access to the point-of-care service UpToDate, for up to one year, with their usage monitored through the tracking of web-log data. The physicians also participated in a one-month trial of relatively complete or limited access. Results: The study found that participants' research interests were not satisfied by article abstracts alone nor, in the case of the physicians, by a clinical summary service such as UpToDate. On average, a third of the physicians viewed research a little more frequently than once a week, while two-thirds of the public health NGO staff viewed more than three articles a week. Those articles were published since the 2008 adoption of the NIH Public Access Policy, as well as prior to 2008 and during the maximum 12-month embargo period. A portion of the articles in each period was already open access, but complete access encouraged a viewing of more research articles. Conclusion: Those working in health fields will utilize more research in the course of their work as a result of (a) increasing open access to research, (b) improving awareness of and preparation for this access, and (c) adjusting public and open access policies to maximize the extent of potential access, through reduction in embargo periods and access to pre-policy literature.
CheXpert is a large dataset of chest X-rays and competition for automated chest x-ray interpretation, which features uncertainty labels and radiologist-labeled reference standard evaluation sets. It consists of 224,316 chest radiographs of 65,240 patients, where the chest radiographic examinations and the associated radiology reports were retrospectively collected from Stanford Hospital. Each report was labeled for the presence of 14 observations as positive, negative, or uncertain. We decided on the 14 observations based on the prevalence in the reports and clinical relevance.
The CheXpert dataset must be downloaded separately after reading and agreeing to a Research Use Agreement. To do so, please follow the instructions on the website, https://stanfordmlgroup.github.io/competitions/chexpert/.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('chexpert', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
This polygon shapefile depicts Williamson Act Parcels with an ongoing contract within the unincorporated areas of the County of Santa Clara, California. The California Land Conservation Act, better known as the Williamson Act, has been the California’s agricultural land protection program since its enactment in1965. The Williamson Act preserves agricultural and open space lands through property tax incentives and voluntary restrictive use contracts. Private landowners voluntarily restrict their land to agricultural and compatible open space uses under minimum ten year rolling term contracts with local governments. In return, restricted parcels are assessed for property tax purposes at a rate consistent with their actual use, rather than potential market value. The purpose of the Williamson Act is to preserve the County’s prime soils and intensive high value agricultural operations, and to discourage premature and unnecessary conversion of agricultural land to urban use. This layer is part of a collection of GIS data for Santa Clara County, California.
Currently, we have 2016 - 2020 New York State HCUP databases. Additional data will be added as it becomes available.
Data Documentation is on the HCUP Website:
%3Cu%3E%3Cstrong%3EState Inpatient Databases (SID)%3C/strong%3E%3C/u%3E
SID Overview:
https://www.hcup-us.ahrq.gov/sidoverview.jsp
NY SID Description:
https://www.hcup-us.ahrq.gov/db/state/siddist/siddist_filecompny.jsp
SID Data Elements by State:
https://www.hcup-us.ahrq.gov/db/state/siddist/siddistvarnote2016.jsp
https://www.hcup-us.ahrq.gov/db/state/siddist/siddistvarnote2017.jsp
%3Cu%3E%3Cstrong%3EState Ambulatory Surgery & Services Databses (SASD)%3C/strong%3E%3C/u%3E
SASD Overview:
https://www.hcup-us.ahrq.gov/db/state/sasddbdocumentation.jsp
NY SASD Description:
https://www.hcup-us.ahrq.gov/db/state/sasddist/sasddist_filecompny.jsp
SASD Data Elements by State:
https://www.hcup-us.ahrq.gov/db/state/sasddist/sasddistvarnote2016.jsp
https://www.hcup-us.ahrq.gov/db/state/sasddist/sasddistvarnote2017.jsp
%3Cu%3E%3Cstrong%3EState Emergency Department Databases (SEDD)%3C/strong%3E%3C/u%3E
SEDD Overview:
https://www.hcup-us.ahrq.gov/db/state/sedddbdocumentation.jsp
NY SEDD Description:
https://www.hcup-us.ahrq.gov/db/state/sedddist/sedddist_filecompny.jsp
SEDD Data Elements by State:
https://www.hcup-us.ahrq.gov/db/state/sedddist/sedddistvarnote2016.jsp
https://www.hcup-us.ahrq.gov/db/state/sedddist/sedddistvarnote2017.jsp
Exports are approved on a case-by-case basis; allow 24 hours for approval. The use of exported data is only permitted for approved projects. An updated data use agreement must be submitted to HCUP by the dataset administrator before proceeding to work with the data for new projects. To initiate a new project or line of inquiry, email ortho_biostats@stanford.edu.
Passing of data to other users outside of the system is not permitted without permission of the data administrator (Jayme Koltsov). All users must complete the HCUP tutorial and data use agreement prior to data access. Email ortho_biostats@stanford.edu to obtain data access.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Two COVID-19 surveys were used to create the test dataset, both collected by teams from the National Institutes of Health (NIH) and Stanford University. The collected data were intended to assess the general topics experienced by participants during the pandemic lockdown. The test dataset comprises a total of 1,000 randomly chosen sentences, with 500 sentences selected from each survey. Each set was annotated by three separate and independent annotators. The annotators were instructed to assess the polarity of each sentence on a scale of -1 (negative), 0 (neutral), or 1 (positive). We then followed a three-step procedure to determine the final labels. First, if all three annotators agreed on a label (full agreement), that label was accepted. Second, if two out of the three agreed on a label (partial agreement), that label was also accepted. Third, if there was no agreement, the label was set as neutral (no agreement).
The CoreLogic Loan-Level Market Analytics (LLMA) for primary mortgages dataset contains detailed loan data, including origination, events, performance, forbearance and inferred modification data.
CoreLogic sources the Loan-Level Market Analytics data directly from loan servicers. CoreLogic cleans and augments the contributed records with modeled data. The Data Dictionary indicates which fields are contributed and which are inferred.
The Loan-Level Market Analytics data is aimed at providing lenders, servicers, investors, and advisory firms with the insights they need to make trustworthy assessments and accurate decisions. Stanford Libraries has purchased the Loan-Level Market Analytics data for researchers interested in housing, economics, finance and other topics related to prime and subprime first lien data.
CoreLogic provided the data to Stanford Libraries as pipe-delimited text files, which we have uploaded to Data Farm (Redivis) for preview, extraction and analysis.
For more information about how the data was prepared for Redivis, please see CoreLogic 2024 GitLab.
Per the End User License Agreement, the LLMA Data cannot be commingled (i.e. merged, mixed or combined) with Tax and Deed Data that Stanford University has licensed from CoreLogic, or other data which includes the same or similar data elements or that can otherwise be used to identify individual persons or loan servicers.
The 2015 major release of CoreLogic Loan-Level Market Analytics (for primary mortgages) was intended to enhance the CoreLogic servicing consortium through data quality improvements and integrated analytics. See **CL_LLMA_ReleaseNotes.pdf **for more information about these changes.
For more information about included variables, please see CL_LLMA_Data_Dictionary.pdf.
**
For more information about how the database was set up, please see LLMA_Download_Guide.pdf.
Data access is required to view this section.
Top-level PSID Stata dataset used for the analysis in A. Auclert, "Monetary Policy in the Redistribution Channel", American Economic Review, June 2019
"ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. The project has been instrumental in advancing computer vision and deep learning research. The data is available for free to researchers for non-commercial use." (https://www.image-net.org/index.php)
I do not hold any copyright to this dataset. This data is just a re-distribution of the data Imagenet.org shared on Kaggle. Please note that some of the ImageNet1K images are under copyright.
This version of the data is directly sourced from Kaggle, excluding the bounding box annotations. Therefore, only images and class labels are included.
All images are resized to 256 x 256.
Integer labels are assigned after ordering the class names alphabetically.
Please note that anyone using this data abides by the original terms: ``` RESEARCHER_FULLNAME has requested permission to use the ImageNet database (the "Database") at Princeton University and Stanford University. In exchange for such permission, Researcher hereby agrees to the following terms and conditions:
The images are processed using [TPU VM](https://cloud.google.com/tpu/docs/users-guide-tpu-vm) via the support of Google's [TPU Research Cloud](https://sites.research.google/trc/about/).
This map denotes the locations of HUD assisted Multi-Family properties that primarily serve elderly residents. In addition, each property illustrated through this service has at least one active Service Coordinator contract or grant, Section 236 loan, Section 8 202 contract, Section 8 Farmers Home Administration (FMHA) 515 contract, Section 8 New Construction contract, Section 202 Project Assistance Contracts (PAC) contract, and Section 202 Project Rental Assistance Contract (PRAC).Please note that the data provided through this map only includes location data and attributes for those addresses that can be geocoded to an interpolated point along a street segment, or to a ZIP+4 centroid location. While not all records are able to be geocoded and mapped, we are continuously working to improve the address data quality and enhance coverage. Please consider this issue when using any datasets provided by HUD.To learn more about the Section 202 Program visit: https://www.hud.gov/program_offices/housing/mfh/progdesc/eld202Data Dictionary: DD_Multifamily PropertiesDate of Coverage: 12/2019Data Updated: Quarterly
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Occurrence of polymorphisms at drug resistance sites.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TDR frequency according to different algorithms.
Digital Archive of Mobile Performance DAMP-VP1k archive of Smule vocal performances, 100 singers, 10 Performance each.
The dataset contains sung musical performances from the Smule app. Data files include audio.zip with all the compressed audio files, and metadata.csv describing some metadata about each performance, including unique identifiers for each recording, song, and singer, as well as binary gender labels, region labels, and social "love" counts from the Smule app.
This archive is a subset of the DAMP-Multiple Songs archive hosted at https://ccrma.stanford.edu/damp/, which contains multiple performances from each of multiple singers, singing different songs, without much verification of the data. This subset has been reduced to a subset of 10 performances per singer, with cleaner recordings for each singer preselected.
Users of this dataset must read and accept Smule's Research Data License Agreement (LICENSE.txt).
The Bidding and Tendering Data of China was released on the Chinese Open Data Platform (CnOpenData). The dataset integrates bidding information from more than 100 bidding websites. It also includes a large amount of bidding and tendering information from unofficial platforms. Variables include unit, agency, winning bidder, process status, process status annotation, creation date, tender opening date, industry type, project number, project title, province, budget, etc.
The raw data were wrangled for inclusion in Data Farm. For more information, please see CnOpenData GitLab.
Data access is required to view this section.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication material for Jackelyn Hwang & Bina Patel Shrimali (2022) Shared and Crowded Housing in the Bay Area: Where Gentrification and the Housing Crisis Meet COVID-19, Housing Policy Debate, DOI: 10.1080/10511482.2022.2099934
Paper Abstract: Amid the growing affordable housing crisis and widespread gentrification over the last decade, people have been moving less than before and increasingly live in shared and often crowded households across the U.S. Crowded housing has various negative health implications, including stress, sleep disorders, and infectious diseases. Difference-in- difference analysis of a unique, large-scale longitudinal consumer credit database of over 450,000 San Francisco Bay Area residents from 2002 to 2020 shows gentrification affects the probability of residents shifting to crowded households across the socioeconomic spectrum but in different ways than expected. Gentrification is negatively associated with low- socioeconomic status (SES) residents’ probability of entering crowded households, and this is largely explained by increased shifts to crowded households in neighborhoods outside of major cities showing early signs of gentrification. Conversely, gentrification is associated with increases in the probability that middle-SES residents enter crowded households, primarily in Silicon Valley. Lastly, crowding is positively associated with COVID-19 case rates, beyond density and socioeconomic and racial composition in neighborhoods, although the role of gentrification remains unclear. Housing policies that mitigate crowding can serve as early interventions in displacement prevention and reducing health inequities.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
WAVES is a pediatric physiological waveform dataset containing ECG, respiratory, plethysmogram, arterial blood pressure, and a variety of other high-frequency waveforms extracted from bedside monitors.
For code examples and documentation, please refer to the WAVES utilities python package or its associated source code.
Waveform data is stored in compressed and base-64 encoded .csv files that cannot be properly loaded and decompressed using standard csv libraries. The utility codebase provides data loaders to interface with the raw data, and usage examples like basic plotting.
As an unrestricted preview of the dataset, the WAVES utilities code includes a very small sample dataset .csv file in the format that would be provided if you extract/filter download a waveform dataset .csv file from Redivis. The "Supporting files" section of the WAVES dataset on Redivis also includes a larger subset of ~25 samples running for roughly 8 hours each.
BY DOWNLOADING THE SAMPLE DATA FILE, YOU ARE AGREEING TO THE TERMS OF THE PROVIDED DATA USE AGREEMENT (DUA)
Initial release of WAVES to validate and document user access