Facebook
Twitterhttps://aimistanford-web-api.azurewebsites.net/licenses/8de476ec-6092-4502-82f0-3e84aa75788f/viewhttps://aimistanford-web-api.azurewebsites.net/licenses/8de476ec-6092-4502-82f0-3e84aa75788f/view
Synthesizing information from various data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including 23,248 CT images, sections of radiology reports, and structured electronic health record (EHR) data (including demographics, diagnoses, procedures, and vitals). Using our provided dataset, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and fused models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best our knowledge, INSPECT is the largest multimodal dataset for enabling reproducible research on strategies for integrating 3D medical imaging and EHR data. EHR modality data is uploaded to Stanford Redivis website (https://redivis.com/Stanford).
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:
· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book
The project was funded with £13,913.85 of Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.
The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021. This includes due concern for participant anonymity and data management.
ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises:
· 15 x Interview transcripts - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent.
All participants have read and approved their transcripts and have had an opportunity to retract details should they wish to do so.
Participants chose whether to be pseudonymised or named directly. The pseudonym can be used to identify individual participant responses in the qualitative coding held within the ‘Fostering cultures of open qualitative research: Dataset 3 – Coding Book’ files.
For recruitment, 14 x participants we selected based on their responses to the project survey., whilst one participant was recruited based on specific expertise.
· 1 x Participant sheet – in .csv format which may by opened with Microsoft Excel, Google Sheet, or an open-source equivalent.
The provides socio-demographic detail on each participant alongside their main field of research and career stage. It includes a RespondentID field/column which can be used to connect interview participants with their responses to the survey questions in the accompanying ‘Fostering cultures of open qualitative research: Dataset 1 – Survey Responses’ files.
The project was undertaken by two staff:
Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Postdoctoral Research Assistant Labelled as ‘Researcher 1’ throughout the dataset
Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout the dataset
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The COKI Open Access Dataset measures open access performance for 142 countries and 5117 institutions and is available in JSON Lines format. The data is visualised at the COKI Open Access Dashboard: https://open.coki.ac/.
The COKI Open Access Dataset is created with the COKI Academic Observatory data collection pipeline, which fetches data about research publications from multiple sources, synthesises the datasets and creates the open access calculations for each country and institution.
Each week a number of specialised research publication datasets are collected. The datasets that are used for the COKI Open Access Dataset release include Crossref Metadata, Microsoft Academic Graph, Unpaywall and the Research Organization Registry.
After fetching the datasets, they are synthesised to produce aggregate time series statistics for each country and institution in the dataset. The aggregate timeseries statistics include publication count, open access status and citation count.
See https://open.coki.ac/data/ for the dataset schema. A new version of the dataset is deposited every week.
Code
License
COKI Open Access Dataset © 2022 by Curtin University is licenced under CC BY 4.0.
Attributions
This work contains information from:
Facebook
TwitterA full description of this dataset along with updated information can be found here.
In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of scholarly articles, including full text content, about COVID-19 and the coronavirus family of viruses for use by the global research community.
This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus will be updated weekly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.
By downloading this dataset you are agreeing to the Dataset license. Specific licensing information for individual articles in the dataset is available in the metadata file.
Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website.
Dataset content:
Commercial use subset
Non-commercial use subset
PMC custom license subset
bioRxiv/medRxiv subset (pre-prints that are not peer reviewed)
Metadata file
Readme
Each paper is represented as a single JSON object (see schema file for details).
Description:
The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:
PubMed's PMC open access corpus using this query (COVID-19 and coronavirus research)
Additional COVID-19 research articles from a corpus maintained by the WHO
bioRxiv and medRxiv pre-prints using the same query as PMC (COVID-19 and coronavirus research)
We also provide a comprehensive metadata file of coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text).
We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available.
This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar. A coalition including the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine of the National Institutes of Health came together to provide this service.
Citation:
When including CORD-19 data in a publication or redistribution, please cite the dataset as follows:
In bibliography:
COVID-19 Open Research Dataset (CORD-19). 2020. Version 2020-MM-DD. Retrieved from https://pages.semanticscholar.org/coronavirus-research. Accessed YYYY-MM-DD. 10.5281/zenodo.3715505
In text:
(CORD-19, 2020)
The Allen Institute for AI and particularly the Semantic Scholar team will continue to provide updates to this dataset as the situation evolves and new research is released.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.
train set split to provide 80% of its images to the training set and 20% of its images to the validation settrain set split to provide 80% of its images to the training set and 20% of its images to the validation set0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to one, two, three, four, five, six, seven, eight, ninetrain (86% of images - 60,000 images) set and test (14% of images - 10,000 images) set only.@article{lecun2010mnist,
title={MNIST handwritten digit database},
author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
volume={2},
year={2010}
}
Facebook
TwitterA comprehensive global database has been assembled to quantify CO2 fluxes and pathways across different levels of integration (from photosynthesis up to net ecosystem production) in forest ecosystems. The database fills an important gap for model calibration, model validation, and hypothesis testing at global and regional scales. The database archive includes: a Microsoft Office Access Database; data files for all tables in the database; query outputs from the database; and SQL script file for re-creating the database from the tables. The database is structured by site (i.e., a forest or stand of known geographical location, biome, species composition, and management regime). It contains carbon budget variables (fluxes and stocks), ecosystem traits (standing biomass, leaf area index, age), and ancillary information (management regime, climate, soil characteristics) for 529 sites from eight forest biomes. Data entries originated from peer-reviewed literature and personal communications with researchers involved in Fluxnet. Flux estimates were included in the database when they were based on direct measurements (e.g., tower-based eddy covariance system measurements), derived from single or multiple direct measurements, or modeled. Stand description was based on observed values, and climatic description was based on the CRU data set and ORCHIDEE model output. Uncertainty for each carbon balance component in the database was estimated in a uniformed way by expert judgment. Robustness of CO2 balances was tested, and closure terms were introduced as a numerical way to approach data quality and flux uncertainty at the biome level.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The GAPs Data Repository provides a comprehensive overview of available qualitative and quantitative data on national return regimes, now accessible through an advanced web interface at https://data.returnmigration.eu/.
This updated guideline outlines the complete process, starting from the initial data collection for the return migration data repository to the development of a comprehensive web-based platform. Through iterative development, participatory approaches, and rigorous quality checks, we have ensured a systematic representation of return migration data at both national and comparative levels.
The Repository organizes data into five main categories, covering diverse aspects and offering a holistic view of return regimes: country profiles, legislation, infrastructure, international cooperation, and descriptive statistics. These categories, further divided into subcategories, are based on insights from a literature review, existing datasets, and empirical data collection from 14 countries. The selection of categories prioritizes relevance for understanding return and readmission policies and practices, data accessibility, reliability, clarity, and comparability. Raw data is meticulously collected by the national experts.
The transition to a web-based interface builds upon the Repository’s original structure, which was initially developed using REDCap (Research Electronic Data Capture). It is a secure web application for building and managing online surveys and databases.The REDCAP ensures systematic data entries and store them on Uppsala University’s servers while significantly improving accessibility and usability as well as data security. It also enables users to export any or all data from the Project when granted full data export privileges. Data can be exported in various ways and formats, including Microsoft Excel, SAS, Stata, R, or SPSS for analysis. At this stage, the Data Repository design team also converted tailored records of available data into public reports accessible to anyone with a unique URL, without the need to log in to REDCap or obtain permission to access the GAPs Project Data Repository. Public reports can be used to share information with stakeholders or external partners without granting them access to the Project or requiring them to set up a personal account. Currently, all public report links inserted in this report are also available on the Repository’s webpage, allowing users to export original data.
This report also includes a detailed codebook to help users understand the structure, variables, and methodologies used in data collection and organization. This addition ensures transparency and provides a comprehensive framework for researchers and practitioners to effectively interpret the data.
The GAPs Data Repository is committed to providing accessible, well-organized, and reliable data by moving to a centralized web platform and incorporating advanced visuals. This Repository aims to contribute inputs for research, policy analysis, and evidence-based decision-making in the return and readmission field.
Explore the GAPs Data Repository at https://data.returnmigration.eu/.
Facebook
TwitterCompiled in mid-2022, this dataset contains the raw data file, randomized ranked lists of R1 and R2 research institutions, and files created to support data visualization for Elizabeth Szkirpan's 2022 study regarding availability of data services and research data information via university libraries for online users. Files are available in Microsoft Excel formats.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This compilation of additional lecture materials offers a practical introduction to key Computer Science (CS) and Digital tools and concepts aimed at enhancing research, teaching, and administrative efficiency. Prepared by Dr. I. de Zarzà, and also reviewed and edited by Dr. J. de Curtò, are designed as a transversal resource, to support students from diverse disciplines—ranging from engineering and business to public management and health sciences.Topics include:· Introduction to Programming· Spreadsheet software and Excel functions· Word processing and Overleaf (LaTeX)· Presentation tools including PowerPoint, SlidesAI, and Genially· Prompt engineering and AI-assisted writing with Copilot and ChatGPT· Web and blog creation using HTML and Blogger· Introduction to databases (SQL and NoSQL)· Cybersecurity fundamentals and safe digital practices· Multimedia generation with AI (voice, video, and music tools like Suno and Sora)Developed across various undergraduate programs at the Universidad de Zaragoza, the notes combine technical know-how with real-world applications in academic and public sector contexts.
Facebook
Twitterhttps://aimistanford-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/viewhttps://aimistanford-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/view
We introduce the Melanoma Research Alliance Multimodal Image Dataset for AI-based Skin Cancer (MRA-MIDAS) dataset, the first publicly available, prospectively-recruited, systematically-paired dermoscopic and clinical image-based dataset across a range of skin-lesion diagnoses. This dataset encompasses a wide array of skin lesions and includes well-annotated, patient-level, clinical metadata. It aims to more accurately mirror real-world clinical scenarios than retrospectively curated datasets and is enhanced by extensive histopathologic confirmation to ensure data integrity. This research was approved by the Institutional Review Board at Stanford University under IRB#36050, along with the Cleveland Clinic Foundation under IRB#20-666, and adhered to the Helsinki Declaration. Patients presenting to the dermatology clinics of participating dermatologists at Stanford Medicine or Cleveland Clinic Foundation between August 18, 2020, and April 17, 2023, were eligible for the study if 1) they had at least one solitary skin lesion of concern identified where a skin biopsy was deemed medically necessary by the dermatologist investigator or 2) patients were directed to in-clinic evaluation for a lesion that was previously identified as concerning through a teledermatology encounter or dermatologist review of a patient photo submitted through the electronic patient messaging portal. Patients underwent written informed consent with either the physician or research coordinator, after which both clinical and dermoscopic digital photography were obtained of any eligible skin lesions. Each lesion underwent standardized photography with a contemporary model iPhone or iPad device (iPhone SE to iPhone 12 Pro and iPod touch to iPad mini) without flash photography at 15-cm and 30-cm distances, along with digital dermatoscope photography. For each lesion, clinical information about the patient was obtained and recorded including sex assigned at birth, age, Fitzpatrick skin type, personal history of melanoma, anatomic location, and the lesion’s length and width. Investigators had the discretion to identify additional control lesions that clinically appeared benign on a corresponding contralateral body site that were similarly enrolled for digital photography as an un-biopsied control lesion to include in the dataset, though model analysis was restricted to biopsied lesions. This dataset contains images obtained from patients at Stanford who provided consent for public release of their images and represents the near entirety of cases enrolled at this site. At the time of first enrollment, the Stanford dermatologists at the specialized pigmented lesion and melanoma clinics had an average of 15.7 years of post-residency experience while those in general medical dermatology clinics had an average of 3.9 years’ experience. Dermatologists noted their top-five ranked clinical impressions at the time of evaluation, along with their binary level of confidence (Yes/No) in their top impression. For any biopsied lesions, associated histopathologic final diagnoses were recorded and categorized into a previously described taxonomy. Biopsy results were interpreted by three board-certified dermatopathologists at Stanford. A dermatopathology consensus conference reviewed any diagnosis of severely dysplastic melanocytic nevus or worse. Melanocytic lesions were specifically grouped in the following manner: benign melanocytic nevi, melanomas (including melanoma in-situ and invasive melanoma), and surgically-eligible intermediate melanocytic tumors where complete excision is typically recommended (including severely dysplastic melanocytic nevi and melanocytomas such as typical/atypical Spitz tumors, such as BAP-1-inactivated melanocytic tumors, deep penetrating nevi/tumors, and cellular blue nevi with atypia). Cases were included in the dataset if a second reviewing independent board-certified dermatologist agreed with the favored diagnosis based on a review of the associated images. Funding: This project is based on research supported by the Melanoma Research Alliance (MRA)- L’Oreal Dermatological Beauty Brands Team Science Award, along with philanthropic funding from the David Mair and Vanessa Vu-Mair Artificial Intelligence in Skin Cancer Fund and the Tal & Cinthia Simon Melanoma Research Fund at Stanford Medicine. Acknowledgments: This material is the result of work supported with resources and the use of facilities at the Veterans Affairs Palo Alto Health Care System in Palo Alto, California.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The cloud-based database market is experiencing explosive growth, projected to reach $30.86 billion in 2025 and exhibiting a remarkable Compound Annual Growth Rate (CAGR) of 53.6% from 2019 to 2033. This phenomenal expansion is fueled by several key drivers. The increasing adoption of cloud computing across diverse industries, coupled with the inherent scalability and cost-effectiveness of cloud-based databases, are primary factors. Furthermore, the growing demand for real-time data analytics and the need for robust data management solutions are significantly contributing to market expansion. Businesses are increasingly migrating from on-premise solutions to leverage the agility, enhanced security features, and improved disaster recovery capabilities offered by cloud databases. The market's competitive landscape is dominated by major players like Amazon Web Services, Google, Microsoft, and Oracle, each offering a comprehensive suite of services. However, the emergence of specialized solutions and open-source options like MongoDB and Cassandra is also driving innovation and expanding market accessibility. The shift towards serverless databases and the increasing adoption of managed services are shaping market trends, while challenges like data security concerns and vendor lock-in remain areas of ongoing concern. The forecast period (2025-2033) promises continued growth, with the market expected to surpass $300 billion. This is predicated on the continued adoption of cloud technologies across all sectors, including healthcare, finance, retail, and manufacturing. Further advancements in database technology, such as AI-powered database management systems and improved integration with other cloud services, will continue to propel market expansion. However, potential restraints include the need for skilled professionals to manage and maintain these complex systems, and the ongoing concern about regulatory compliance and data sovereignty. The continuous evolution of hybrid cloud deployments will offer a path for organizations seeking a balanced approach between public and private cloud deployments, creating another exciting avenue for market growth. The geographically diverse player base ensures that the market's growth will be felt globally, with regional variations depending on digital infrastructure development and adoption rates.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://aimistanford-web-api.azurewebsites.net/licenses/8de476ec-6092-4502-82f0-3e84aa75788f/viewhttps://aimistanford-web-api.azurewebsites.net/licenses/8de476ec-6092-4502-82f0-3e84aa75788f/view
Synthesizing information from various data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including 23,248 CT images, sections of radiology reports, and structured electronic health record (EHR) data (including demographics, diagnoses, procedures, and vitals). Using our provided dataset, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and fused models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best our knowledge, INSPECT is the largest multimodal dataset for enabling reproducible research on strategies for integrating 3D medical imaging and EHR data. EHR modality data is uploaded to Stanford Redivis website (https://redivis.com/Stanford).