11 datasets found

M
Data from: INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis...
stanfordaimi.azurewebsites.net
Updated Jun 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft Research (2025). INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis [Dataset]. https://stanfordaimi.azurewebsites.net/datasets/151848b9-8b31-4129-bc25-cefdf18f95d8
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Microsoft Research
License
https://aimistanford-web-api.azurewebsites.net/licenses/8de476ec-6092-4502-82f0-3e84aa75788f/viewhttps://aimistanford-web-api.azurewebsites.net/licenses/8de476ec-6092-4502-82f0-3e84aa75788f/view
Description
Synthesizing information from various data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including 23,248 CT images, sections of radiology reports, and structured electronic health record (EHR) data (including demographics, diagnoses, procedures, and vitals). Using our provided dataset, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and fused models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best our knowledge, INSPECT is the largest multimodal dataset for enabling reproducible research on strategies for integrating 3D medical imaging and EHR data. EHR modality data is uploaded to Stanford Redivis website (https://redivis.com/Stanford).
s
Data from: Fostering cultures of open qualitative research: Dataset 2 –...
orda.shef.ac.uk
xlsx
Updated Oct 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Hanchard; Itzel San Roman Pineda (2025). Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts [Dataset]. http://doi.org/10.15131/shef.data.23567223.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.23567223.v2
Dataset updated
Oct 8, 2025
Dataset provided by
The University of Sheffield
Authors
Matthew Hanchard; Itzel San Roman Pineda
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:

· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book

The project was funded with £13,913.85 of Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.

The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021. This includes due concern for participant anonymity and data management.

ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises:

· 15 x Interview transcripts - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent.

All participants have read and approved their transcripts and have had an opportunity to retract details should they wish to do so.

Participants chose whether to be pseudonymised or named directly. The pseudonym can be used to identify individual participant responses in the qualitative coding held within the ‘Fostering cultures of open qualitative research: Dataset 3 – Coding Book’ files.

For recruitment, 14 x participants we selected based on their responses to the project survey., whilst one participant was recruited based on specific expertise.

· 1 x Participant sheet – in .csv format which may by opened with Microsoft Excel, Google Sheet, or an open-source equivalent.

The provides socio-demographic detail on each participant alongside their main field of research and career stage. It includes a RespondentID field/column which can be used to connect interview participants with their responses to the survey questions in the accompanying ‘Fostering cultures of open qualitative research: Dataset 1 – Survey Responses’ files.

The project was undertaken by two staff:

Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Postdoctoral Research Assistant Labelled as ‘Researcher 1’ throughout the dataset

Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout the dataset
COKI Open Access Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Oct 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Hosking; Richard Hosking; James P. Diprose; James P. Diprose; Aniek Roelofs; Aniek Roelofs; Tuan-Yow Chien; Tuan-Yow Chien; Lucy Montgomery; Lucy Montgomery; Cameron Neylon; Cameron Neylon (2023). COKI Open Access Dataset [Dataset]. http://doi.org/10.5281/zenodo.7048603
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7048603
Dataset updated
Oct 3, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Richard Hosking; Richard Hosking; James P. Diprose; James P. Diprose; Aniek Roelofs; Aniek Roelofs; Tuan-Yow Chien; Tuan-Yow Chien; Lucy Montgomery; Lucy Montgomery; Cameron Neylon; Cameron Neylon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The COKI Open Access Dataset measures open access performance for 142 countries and 5117 institutions and is available in JSON Lines format. The data is visualised at the COKI Open Access Dashboard: https://open.coki.ac/.

The COKI Open Access Dataset is created with the COKI Academic Observatory data collection pipeline, which fetches data about research publications from multiple sources, synthesises the datasets and creates the open access calculations for each country and institution.

Each week a number of specialised research publication datasets are collected. The datasets that are used for the COKI Open Access Dataset release include Crossref Metadata, Microsoft Academic Graph, Unpaywall and the Research Organization Registry.

After fetching the datasets, they are synthesised to produce aggregate time series statistics for each country and institution in the dataset. The aggregate timeseries statistics include publication count, open access status and citation count.

See https://open.coki.ac/data/ for the dataset schema. A new version of the dataset is deposited every week.

Code

The COKI Academic Observatory data collection pipeline is used to create the dataset.

The COKI OA Website Github project contains the code for the web app that visualises the dataset at open.coki.ac. It can be found on Zenodo here.

License
COKI Open Access Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

Attributions
This work contains information from:

Microsoft Academic Graph which is made available under the ODC Attribution Licence.

Crossref Metadata via the Metadata Plus program. Bibliographic metadata is made available without copyright restriction and Crossref generated data under a CC0 licence. See metadata licence information for more details.

Unpaywall. The Unpaywall Data Feed is used under license. Data is freely available from Unpaywall via the API, data dumps and as a data feed.

Research Organization Registry which is made available under a CC0 licence.
Z
COVID-19 Open Research Dataset (CORD-19)
data.niaid.nih.gov
marketplace.sshopencloud.eu
Updated Jul 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucy Lu Wang (2024). COVID-19 Open Research Dataset (CORD-19) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3715505
Explore at:
Dataset updated
Jul 22, 2024
Dataset provided by
Kyle Lo
JJ Yang
Lucy Lu Wang
Sebastian Kohlmeier
Description
A full description of this dataset along with updated information can be found here.

In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of scholarly articles, including full text content, about COVID-19 and the coronavirus family of viruses for use by the global research community.

This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.

By downloading this dataset you are agreeing to the Dataset license. Specific licensing information for individual articles in the dataset is available in the metadata file.

Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website.

Dataset content:

Commercial use subset

Non-commercial use subset

PMC custom license subset

bioRxiv/medRxiv subset (pre-prints that are not peer reviewed)

Metadata file

Readme

Each paper is represented as a single JSON object (see schema file for details).

Description:

The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:

PubMed's PMC open access corpus using this query (COVID-19 and coronavirus research)

Additional COVID-19 research articles from a corpus maintained by the WHO

bioRxiv and medRxiv pre-prints using the same query as PMC (COVID-19 and coronavirus research)

We also provide a comprehensive metadata file of coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text).

We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available.

This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar. A coalition including the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine of the National Institutes of Health came together to provide this service.

Citation:

When including CORD-19 data in a publication or redistribution, please cite the dataset as follows:

In bibliography:

COVID-19 Open Research Dataset (CORD-19). 2020. Version 2020-MM-DD. Retrieved from https://pages.semanticscholar.org/coronavirus-research. Accessed YYYY-MM-DD. 10.5281/zenodo.3715505

In text:

(CORD-19, 2020)

The Allen Institute for AI and particularly the Semantic Scholar team will continue to provide updates to this dataset as the situation evolves and new research is released.
R
Mnist Dataset
universe.roboflow.com
tensorflow.org
+3more
zip
Updated Aug 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Popular Benchmarks (2022). Mnist Dataset [Dataset]. https://universe.roboflow.com/popular-benchmarks/mnist-cjkff/model/2
Explore at:
zipAvailable download formats
Dataset updated
Aug 8, 2022
Dataset authored and provided by
Popular Benchmarks
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Digits
Description
THE MNIST DATABASE of handwritten digits

Authors:

Yann LeCun, Courant Institute, NYU

Corinna Cortes, Google Labs, New York

Christopher J.C. Burges, Microsoft Research, Redmond

Dataset Obtained From: http://yann.lecun.com/exdb/mnist/

All images were sized 28x28 in the original dataset

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

Version 1 (original-images_trainSetSplitBy80_20):

Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set

Trained from Roboflow Classification Model's ImageNet training checkpoint

Version 2 (original-images_ModifiedClasses_trainSetSplitBy80_20):

Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set

Modify Classes, a Roboflow preprocessing feature, was employed to change class names from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to one, two, three, four, five, six, seven, eight, nine

Trained from the Roboflow Classification Model's ImageNet training checkpoint

Version 3 (original-images_Original-MNIST-Splits):

Original images, with the original splits for MNIST: train (86% of images - 60,000 images) set and test (14% of images - 10,000 images) set only.

This version was not trained

Citation:

@article{lecun2010mnist, title={MNIST handwritten digit database}, author={LeCun, Yann and Cortes, Corinna and Burges, CJ}, journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist}, volume={2}, year={2010} }
Global Forest Ecosystem Structure and Function Data For Carbon Balance...
data.nasa.gov
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Global Forest Ecosystem Structure and Function Data For Carbon Balance Research - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/global-forest-ecosystem-structure-and-function-data-for-carbon-balance-research-e3cd6
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
A comprehensive global database has been assembled to quantify CO2 fluxes and pathways across different levels of integration (from photosynthesis up to net ecosystem production) in forest ecosystems. The database fills an important gap for model calibration, model validation, and hypothesis testing at global and regional scales. The database archive includes: a Microsoft Office Access Database; data files for all tables in the database; query outputs from the database; and SQL script file for re-creating the database from the tables. The database is structured by site (i.e., a forest or stand of known geographical location, biome, species composition, and management regime). It contains carbon budget variables (fluxes and stocks), ecosystem traits (standing biomass, leaf area index, age), and ancillary information (management regime, climate, soil characteristics) for 529 sites from eight forest biomes. Data entries originated from peer-reviewed literature and personal communications with researchers involved in Fluxnet. Flux estimates were included in the database when they were based on direct measurements (e.g., tower-based eddy covariance system measurements), derived from single or multiple direct measurements, or modeled. Stand description was based on observed values, and climatic description was based on the CRU data set and ORCHIDEE model output. Uncertainty for each carbon balance component in the database was estimated in a uniformed way by expert judgment. Robustness of CO2 balances was tested, and closure terms were introduced as a numerical way to approach data quality and flux uncertainty at the biome level.
z
GAPs Data Repository on Return: Guideline, Data Samples and Codebook
zenodo.org
data.niaid.nih.gov
+1more
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeynep Sahin Mencutek; Zeynep Sahin Mencutek; Fatma Yılmaz-Elmas; Fatma Yılmaz-Elmas (2025). GAPs Data Repository on Return: Guideline, Data Samples and Codebook [Dataset]. http://doi.org/10.5281/zenodo.14862490
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.14862490
Dataset updated
Feb 13, 2025
Dataset provided by
RedCAP
Authors
Zeynep Sahin Mencutek; Zeynep Sahin Mencutek; Fatma Yılmaz-Elmas; Fatma Yılmaz-Elmas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The GAPs Data Repository provides a comprehensive overview of available qualitative and quantitative data on national return regimes, now accessible through an advanced web interface at https://data.returnmigration.eu/.

This updated guideline outlines the complete process, starting from the initial data collection for the return migration data repository to the development of a comprehensive web-based platform. Through iterative development, participatory approaches, and rigorous quality checks, we have ensured a systematic representation of return migration data at both national and comparative levels.

The Repository organizes data into five main categories, covering diverse aspects and offering a holistic view of return regimes: country profiles, legislation, infrastructure, international cooperation, and descriptive statistics. These categories, further divided into subcategories, are based on insights from a literature review, existing datasets, and empirical data collection from 14 countries. The selection of categories prioritizes relevance for understanding return and readmission policies and practices, data accessibility, reliability, clarity, and comparability. Raw data is meticulously collected by the national experts.

The transition to a web-based interface builds upon the Repository’s original structure, which was initially developed using REDCap (Research Electronic Data Capture). It is a secure web application for building and managing online surveys and databases.The REDCAP ensures systematic data entries and store them on Uppsala University’s servers while significantly improving accessibility and usability as well as data security. It also enables users to export any or all data from the Project when granted full data export privileges. Data can be exported in various ways and formats, including Microsoft Excel, SAS, Stata, R, or SPSS for analysis. At this stage, the Data Repository design team also converted tailored records of available data into public reports accessible to anyone with a unique URL, without the need to log in to REDCap or obtain permission to access the GAPs Project Data Repository. Public reports can be used to share information with stakeholders or external partners without granting them access to the Project or requiring them to set up a personal account. Currently, all public report links inserted in this report are also available on the Repository’s webpage, allowing users to export original data.

This report also includes a detailed codebook to help users understand the structure, variables, and methodologies used in data collection and organization. This addition ensures transparency and provides a comprehensive framework for researchers and practitioners to effectively interpret the data.

The GAPs Data Repository is committed to providing accessible, well-organized, and reliable data by moving to a centralized web platform and incorporating advanced visuals. This Repository aims to contribute inputs for research, policy analysis, and evidence-based decision-making in the return and readmission field.

Explore the GAPs Data Repository at https://data.returnmigration.eu/.
d
Comparison of R1 and R2 Online Research Data Services
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Szkirpan, Elizabeth (2023). Comparison of R1 and R2 Online Research Data Services [Dataset]. http://doi.org/10.7910/DVN/SHJABB
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SHJABB
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Szkirpan, Elizabeth
Description
Compiled in mid-2022, this dataset contains the raw data file, randomized ranked lists of R1 and R2 research institutions, and files created to support data visualization for Elizabeth Szkirpan's 2022 study regarding availability of data services and research data information via university libraries for online users. Files are available in Microsoft Excel formats.
Lecture Notes - CS Tools - 2024/2025 – deZarza
figshare.com
pdf
Updated Jul 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
I. de Zarzà; J. de Curto (2025). Lecture Notes - CS Tools - 2024/2025 – deZarza [Dataset]. http://doi.org/10.6084/m9.figshare.29582690.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29582690.v1
Dataset updated
Jul 16, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
I. de Zarzà; J. de Curto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This compilation of additional lecture materials offers a practical introduction to key Computer Science (CS) and Digital tools and concepts aimed at enhancing research, teaching, and administrative efficiency. Prepared by Dr. I. de Zarzà, and also reviewed and edited by Dr. J. de Curtò, are designed as a transversal resource, to support students from diverse disciplines—ranging from engineering and business to public management and health sciences.Topics include:· Introduction to Programming· Spreadsheet software and Excel functions· Word processing and Overleaf (LaTeX)· Presentation tools including PowerPoint, SlidesAI, and Genially· Prompt engineering and AI-assisted writing with Copilot and ChatGPT· Web and blog creation using HTML and Blogger· Introduction to databases (SQL and NoSQL)· Cybersecurity fundamentals and safe digital practices· Multimedia generation with AI (voice, video, and music tools like Suno and Sora)Developed across various undergraduate programs at the Universidad de Zaragoza, the notes combine technical know-how with real-world applications in academic and public sector contexts.
M
MRA-MIDAS: Multimodal Image Dataset for AI-based Skin Cancer
stanfordaimi.azurewebsites.net
Updated Jun 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft Research (2024). MRA-MIDAS: Multimodal Image Dataset for AI-based Skin Cancer [Dataset]. https://stanfordaimi.azurewebsites.net/datasets/f4c2020f-801a-42dd-a477-a1a8357ef2a5
Explore at:
Dataset updated
Jun 18, 2024
Dataset authored and provided by
Microsoft Research
License
https://aimistanford-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/viewhttps://aimistanford-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/view
Description
We introduce the Melanoma Research Alliance Multimodal Image Dataset for AI-based Skin Cancer (MRA-MIDAS) dataset, the first publicly available, prospectively-recruited, systematically-paired dermoscopic and clinical image-based dataset across a range of skin-lesion diagnoses. This dataset encompasses a wide array of skin lesions and includes well-annotated, patient-level, clinical metadata. It aims to more accurately mirror real-world clinical scenarios than retrospectively curated datasets and is enhanced by extensive histopathologic confirmation to ensure data integrity. This research was approved by the Institutional Review Board at Stanford University under IRB#36050, along with the Cleveland Clinic Foundation under IRB#20-666, and adhered to the Helsinki Declaration. Patients presenting to the dermatology clinics of participating dermatologists at Stanford Medicine or Cleveland Clinic Foundation between August 18, 2020, and April 17, 2023, were eligible for the study if 1) they had at least one solitary skin lesion of concern identified where a skin biopsy was deemed medically necessary by the dermatologist investigator or 2) patients were directed to in-clinic evaluation for a lesion that was previously identified as concerning through a teledermatology encounter or dermatologist review of a patient photo submitted through the electronic patient messaging portal. Patients underwent written informed consent with either the physician or research coordinator, after which both clinical and dermoscopic digital photography were obtained of any eligible skin lesions. Each lesion underwent standardized photography with a contemporary model iPhone or iPad device (iPhone SE to iPhone 12 Pro and iPod touch to iPad mini) without flash photography at 15-cm and 30-cm distances, along with digital dermatoscope photography. For each lesion, clinical information about the patient was obtained and recorded including sex assigned at birth, age, Fitzpatrick skin type, personal history of melanoma, anatomic location, and the lesion’s length and width. Investigators had the discretion to identify additional control lesions that clinically appeared benign on a corresponding contralateral body site that were similarly enrolled for digital photography as an un-biopsied control lesion to include in the dataset, though model analysis was restricted to biopsied lesions. This dataset contains images obtained from patients at Stanford who provided consent for public release of their images and represents the near entirety of cases enrolled at this site. At the time of first enrollment, the Stanford dermatologists at the specialized pigmented lesion and melanoma clinics had an average of 15.7 years of post-residency experience while those in general medical dermatology clinics had an average of 3.9 years’ experience. Dermatologists noted their top-five ranked clinical impressions at the time of evaluation, along with their binary level of confidence (Yes/No) in their top impression. For any biopsied lesions, associated histopathologic final diagnoses were recorded and categorized into a previously described taxonomy. Biopsy results were interpreted by three board-certified dermatopathologists at Stanford. A dermatopathology consensus conference reviewed any diagnosis of severely dysplastic melanocytic nevus or worse. Melanocytic lesions were specifically grouped in the following manner: benign melanocytic nevi, melanomas (including melanoma in-situ and invasive melanoma), and surgically-eligible intermediate melanocytic tumors where complete excision is typically recommended (including severely dysplastic melanocytic nevi and melanocytomas such as typical/atypical Spitz tumors, such as BAP-1-inactivated melanocytic tumors, deep penetrating nevi/tumors, and cellular blue nevi with atypia). Cases were included in the dataset if a second reviewing independent board-certified dermatologist agreed with the favored diagnosis based on a review of the associated images. Funding: This project is based on research supported by the Melanoma Research Alliance (MRA)- L’Oreal Dermatological Beauty Brands Team Science Award, along with philanthropic funding from the David Mair and Vanessa Vu-Mair Artificial Intelligence in Skin Cancer Fund and the Tal & Cinthia Simon Melanoma Research Fund at Stanford Medicine. Acknowledgments: This material is the result of work supported with resources and the use of facilities at the Veterans Affairs Palo Alto Health Care System in Palo Alto, California.
C
Cloud-based Database Report
datainsightsmarket.com
doc, pdf, ppt
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Cloud-based Database Report [Dataset]. https://www.datainsightsmarket.com/reports/cloud-based-database-1454611
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Aug 12, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The cloud-based database market is experiencing explosive growth, projected to reach $30.86 billion in 2025 and exhibiting a remarkable Compound Annual Growth Rate (CAGR) of 53.6% from 2019 to 2033. This phenomenal expansion is fueled by several key drivers. The increasing adoption of cloud computing across diverse industries, coupled with the inherent scalability and cost-effectiveness of cloud-based databases, are primary factors. Furthermore, the growing demand for real-time data analytics and the need for robust data management solutions are significantly contributing to market expansion. Businesses are increasingly migrating from on-premise solutions to leverage the agility, enhanced security features, and improved disaster recovery capabilities offered by cloud databases. The market's competitive landscape is dominated by major players like Amazon Web Services, Google, Microsoft, and Oracle, each offering a comprehensive suite of services. However, the emergence of specialized solutions and open-source options like MongoDB and Cassandra is also driving innovation and expanding market accessibility. The shift towards serverless databases and the increasing adoption of managed services are shaping market trends, while challenges like data security concerns and vendor lock-in remain areas of ongoing concern. The forecast period (2025-2033) promises continued growth, with the market expected to surpass $300 billion. This is predicated on the continued adoption of cloud technologies across all sectors, including healthcare, finance, retail, and manufacturing. Further advancements in database technology, such as AI-powered database management systems and improved integration with other cloud services, will continue to propel market expansion. However, potential restraints include the need for skilled professionals to manage and maintain these complex systems, and the ongoing concern about regulatory compliance and data sovereignty. The continuous evolution of hybrid cloud deployments will offer a path for organizations seeking a balanced approach between public and private cloud deployments, creating another exciting avenue for market growth. The geographically diverse player base ensures that the market's growth will be felt globally, with regional variations depending on digital infrastructure development and adoption rates.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Microsoft Research (2025). INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis [Dataset]. https://stanfordaimi.azurewebsites.net/datasets/151848b9-8b31-4129-bc25-cefdf18f95d8

Data from: INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis

Explore at:

Dataset updated

Jun 26, 2025

Dataset authored and provided by

Microsoft Research

License

https://aimistanford-web-api.azurewebsites.net/licenses/8de476ec-6092-4502-82f0-3e84aa75788f/viewhttps://aimistanford-web-api.azurewebsites.net/licenses/8de476ec-6092-4502-82f0-3e84aa75788f/view

Description

Synthesizing information from various data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including 23,248 CT images, sections of radiology reports, and structured electronic health record (EHR) data (including demographics, diagnoses, procedures, and vitals). Using our provided dataset, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and fused models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best our knowledge, INSPECT is the largest multimodal dataset for enabling reproducible research on strategies for integrating 3D medical imaging and EHR data. EHR modality data is uploaded to Stanford Redivis website (https://redivis.com/Stanford).

Clear search

Close search

Google apps

Main menu

Data from: INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis...

Data from: Fostering cultures of open qualitative research: Dataset 2 –...

COKI Open Access Dataset

COVID-19 Open Research Dataset (CORD-19)

Mnist Dataset

THE MNIST DATABASE of handwritten digits

Authors:

Dataset Obtained From: http://yann.lecun.com/exdb/mnist/

All images were sized 28x28 in the original dataset

Version 1 (original-images_trainSetSplitBy80_20):

Version 2 (original-images_ModifiedClasses_trainSetSplitBy80_20):

Version 3 (original-images_Original-MNIST-Splits):

Citation:

Global Forest Ecosystem Structure and Function Data For Carbon Balance...

GAPs Data Repository on Return: Guideline, Data Samples and Codebook

Comparison of R1 and R2 Online Research Data Services

Lecture Notes - CS Tools - 2024/2025 – deZarza

MRA-MIDAS: Multimodal Image Dataset for AI-based Skin Cancer

Cloud-based Database Report

Data from: INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and PrognosisSee More Versions

Data from: INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis