Facebook
TwitterThe National Center for Advancing Translational Sciences (NCATS) has systematically compiled clinical, laboratory and diagnostic data from electronic health records to support COVID-19 research efforts via the National COVID Cohort Collaborative (N3C) Data Enclave. As of August 2, 2022, the repository contains information from over 15 million patients (including 5.8 million COVID-19 positive patients) across the United States.
The N3C Data Enclave is organized into 3 levels of data with varying access restrictions:
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Cambrian-Alignment Dataset
Please see paper & website for more information:
https://cambrian-mllm.github.io/ https://arxiv.org/abs/2406.16860
Overview
Cambrian-Alignment is an question-answering alignment dataset comprised of alignment data from LLaVA, Mini-Gemini, Allava, and ShareGPT4V.
Getting Started with Cambrian Alignment Data
Before you start, ensure you have sufficient storage space to download and process the data.
Download the Data Repository… See the full description on the dataset page: https://huggingface.co/datasets/nyu-visionx/Cambrian-Alignment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 14 verified Store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 5 verified Home improvement store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 1 verified Pet supply store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Pisa Experiments
This repository contains the PisaBench, training data, model checkpoints, introduced in PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop.
PisaBench
Real World Videos
We curate a dataset comprising 361 videos demonstrating the dropping task.Each video begins with an object suspended by an invisible wire in the first frame. We cut the video clips to begin as soon as the… See the full description on the dataset page: https://huggingface.co/datasets/nyu-visionx/pisa-experiments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 6 verified Building materials store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This repository contains a range of Arena-Hard-Auto benchmark artifacts sourced as part of the 2024 paper Style Outweighs Substance. Repository Structure Model Responses for Arena Hard Auto Questions: data/ArenaHardAuto/model_answer
Our standard reference model for pairwise comparisons was gpt-4-0314.
Our standard set of comparison models was:
Llama-3-8B Variants: bagel-8b-v1.0, Llama-3-8B-Magpie-Align-SFT-v0.2, Llama-3-8B-Magpie-Align-v0.2, Llama-3-8B-Tulu-330K, Llama-3-8B-WildChat… See the full description on the dataset page: https://huggingface.co/datasets/nyu-dice-lab/sos-artifacts.
Facebook
TwitterSince 2002, the Interdisciplinary Melanoma Cooperative Group (IMCG) at Perlmutter Cancer Center has maintained one of the largest clinicopathologic resources, the Melanoma Clinicopathological-Biospecimen Database and Repository, for research on patients 18 years old and over with melanoma or at high risk for melanoma. Clinical data is stored in a secure REDCap database which contains 653 fields to capture clinical and pathological information. The database can be queried for research studies; customized datasets for statistical analyses are created in SAS®. Follow-up data is collected every 3, 6, or 12 months depending on the patient's clinical stage. Biospecimens (i.e., blood/buffy coat, sera, plasma, lymphocytes; and blocks of primary, metastatic, and fresh melanoma tissues) are securely cataloged in LabVantage with linkage to corresponding clinical and pathological data contained in REDCap. Integration of high-quality, annotated biospecimens with clinicopathological data allow applications such as the examination of RNA expression (fresh tissue), protein expression (paraffin embedded tissue), and germline DNA sequences (blood) from the same patients.
As of March 2023, 5,790 consenting patients (including 399 high-risk patients) have contributed clinical data and 99,039 biospecimens to the project. 2,977(55%) of patients are male; the mean age at diagnosis was 60 years old with a mean follow-up duration of 55 months. These metrics are subject to change over time.
Prioritization Plan for Biospecimen Distribution
To use the resources in the Melanoma Clinicopathological-Biospecimen Database and Repository, investigators need to fill the attached request form. The request is reviewed by the IMCG Biospecimen Committee, consisting of:
The Committee meets monthly to make decisions regarding distribution of biospecimens based on the scientific merit and status of funding, with priority given to investigators with peer-reviewed funding for projects requiring evaluation of specific biospecimens. Prioritization will be as follows:
If a conflict arises between two (or more) competing interests within the same category (e.g., two SPORE research projects), the committee decides based on the following criteria:
For any project that potentially requires prospective collection, the Biospecimen Committee will attempt to acquire enough materials to allow multi-investigator utilization.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 5 verified Appliance store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 2 verified Hardware store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterThe dataset first described in the "Stanford 3D Objects" section of the paper Disentangling by Subspace Diffusion. The data consists of 100,000 renderings each of the Bunny and Dragon objects from the Stanford 3D Scanning Repository. More objects may be added in the future, but only the Bunny and Dragon are used in the paper. Each object is rendered with a uniformly sampled illumination from a point on the 2-sphere, and a uniformly sampled 3D rotation. The true latent states are provided as NumPy arrays along with the images. The lighting is given as a 3-vector with unit norm, while the rotation is provided both as a quaternion and a 3x3 orthogonal matrix.
There are many similarities between S3O4D and existing ML benchmark datasets like NORB, 3D Chairs, 3D Shapes and many others, which also include renderings of a set of objects under different pose and illumination conditions. However, none of these existing datasets include the full manifold of rotations in 3D - most include only a subset of changes to elevation and azimuth. S3O4D images are sampled uniformly and independently from the full space of rotations and illuminations, meaning the dataset contains objects that are upside down and illuminated from behind or underneath. We believe that this makes S3O4D uniquely suited for research on generative models where the latent space has non-trivial topology, as well as for general manifold learning methods where the curvature of the manifold is important.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('s3o4d', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/s3o4d-1.0.0.png" alt="Visualization" width="500px">
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 8 verified Fabric store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 4 verified Chinaware store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 1 verified Flooring store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 1 verified Dairy store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 6 verified Cosmetics store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 2 verified Dollar store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 1 verified Auto parts store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 2 verified Shoe store businesses in Nyu District, Fukui, Japan with complete contact information, ratings, reviews, and location data.
Facebook
TwitterThe National Center for Advancing Translational Sciences (NCATS) has systematically compiled clinical, laboratory and diagnostic data from electronic health records to support COVID-19 research efforts via the National COVID Cohort Collaborative (N3C) Data Enclave. As of August 2, 2022, the repository contains information from over 15 million patients (including 5.8 million COVID-19 positive patients) across the United States.
The N3C Data Enclave is organized into 3 levels of data with varying access restrictions: