Facebook
TwitterThe State Emergency Department Databases (SEDD) are part of the family of databases and software tools developed for the Healthcare Cost and Utilization Project (HCUP). The SEDD are a set of databases that capture discharge information on all emergency department visits that do not result in an admission. The SEDD combined with SID discharges that originate in the emergency department are well suited for research and policy questions that require complete enumeration of hospital-based emergency departments within market areas or states. Data may not be available for all states across all years.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Polina Stepanenko
Released under Database: Open Database, Contents: Database Contents
Facebook
TwitterThe Nationwide Inpatient Sample (NIS) is part of a family of databases and software tools developed for the Healthcare Cost and Utilization Project (HCUP). The NIS is the largest all-payer inpatient health care database in the United States, yielding national estimates of hospital inpatient stays. The NIS can be used to identify, track, and analyze national trends in health care utilization, access, charges, quality, and outcomes. Data may not be available for all states across all years.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metrics are represented as Mean±Standard Deviation. Results for the proposed approach are shown using both the second- and higher-order MGRF model. Age range of this group is 6.5–39.1 years.
Facebook
TwitterThe Nationwide Readmissions Database (NRD) is database under the Healthcare Cost and Utilization Project (HCUP) which contains nationally representative information on hospital readmissions for all ages, including all payers and the uninsured. The NRD contains data from approximately 18 million discharges per year (35 million weighted discharges) across most of the United States.
Data elements include:
The NRD consists of four data files:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterSince 2002, the Interdisciplinary Melanoma Cooperative Group (IMCG) at Perlmutter Cancer Center has maintained one of the largest clinicopathologic resources, the Melanoma Clinicopathological-Biospecimen Database and Repository, for research on patients 18 years old and over with melanoma or at high risk for melanoma. Clinical data is stored in a secure REDCap database which contains 653 fields to capture clinical and pathological information. The database can be queried for research studies; customized datasets for statistical analyses are created in SAS®. Follow-up data is collected every 3, 6, or 12 months depending on the patient's clinical stage. Biospecimens (i.e., blood/buffy coat, sera, plasma, lymphocytes; and blocks of primary, metastatic, and fresh melanoma tissues) are securely cataloged in LabVantage with linkage to corresponding clinical and pathological data contained in REDCap. Integration of high-quality, annotated biospecimens with clinicopathological data allow applications such as the examination of RNA expression (fresh tissue), protein expression (paraffin embedded tissue), and germline DNA sequences (blood) from the same patients.
As of March 2023, 5,790 consenting patients (including 399 high-risk patients) have contributed clinical data and 99,039 biospecimens to the project. 2,977(55%) of patients are male; the mean age at diagnosis was 60 years old with a mean follow-up duration of 55 months. These metrics are subject to change over time.
Prioritization Plan for Biospecimen Distribution
To use the resources in the Melanoma Clinicopathological-Biospecimen Database and Repository, investigators need to fill the attached request form. The request is reviewed by the IMCG Biospecimen Committee, consisting of:
The Committee meets monthly to make decisions regarding distribution of biospecimens based on the scientific merit and status of funding, with priority given to investigators with peer-reviewed funding for projects requiring evaluation of specific biospecimens. Prioritization will be as follows:
If a conflict arises between two (or more) competing interests within the same category (e.g., two SPORE research projects), the committee decides based on the following criteria:
For any project that potentially requires prospective collection, the Biospecimen Committee will attempt to acquire enough materials to allow multi-investigator utilization.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a portion of the dataset of the images collected by the S.M.A.R.T Construction Research Group at NYUAD from a construction site on campus. The dataset contains a subset of the data/images used for the manuscript titled 'Transfer-learning and texture-based features for detailed recognition of the conditions of construction materials with small datasets' by Eyob Mengiste, Karunakar Reddy Mannem, Samuel A. Prieto and Borja García de Soto. Those interested in the complete dataset for research purposes can contact the corresponding author (eyob.mengiste@nyu.edu) for more information.
This partial database contains a total of 208 images for 7 construction material conditions broken down as follows: CMU wall - 24 images, Chiseled concrete - 49 images, Concrete - 18 images, Gypsum - 26 images Mesh - 25 images First coat plaster - 37 images Second coat plaster - 29 images
Facebook
TwitterThe Nationwide Emergency Department Sample (NEDS) is part of a family of databases and software tools developed for the Healthcare Cost and Utilization Project (HCUP). The NEDS is the largest all-payer emergency department (ED) database in the United States, yielding national estimates of hospital-based ED visits. The NEDS enables analyses of ED utilization patterns and supports public health professionals, administrators, policymakers, and clinicians in their decisionmaking regarding this critical source of care.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This database is intended for experiments in 3D object reocgnition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by 2 cameras under 6 lighting conditions, 9 elevations , and 18 azimuths.
The training set is composed of 5 instances of each category (instances 4, 6, 7, 8 and 9), and the test set of the remaining 5 instances (instances 0, 1, 2, 3, and 5).
https://www.researchgate.net/profile/Sven_Behnke/publication/221080312/figure/fig2/AS:393937547218944@1470933438013/Fig-3-Images-from-the-NORB-normalized-uniform-dataset.ppm" alt="Some images from the NORB">
The Dataset was created by: Fu Jie Huang, Yann LeCun Courant Institute, New York University October, 2005
The dataset as well as some of this overview was taken from : The official site
The files are in a simple binary matrix format, with file postfix ".mat".
PAIRS : Each pair is composed of 2 images(24.300 * 2 = 46.600 Total), one left and one right and is commontly used for experiments in binocular mode. For experiments in monocular mode use just one of the two images (24.300 Total).
The "-cat" files store the corresponding category of the images. The corresponding "-cat" file contains 24,300 category labels (0 for animal, 1 for human, 2 for plane, 3 for truck, 4 for car).
Each "-info" file stores 24,300 4-dimensional vectors, which contain additional information about the corresponding images:
For regular training and testing, "-dat" and "-cat" files are sufficient. "-info" files are provided in case some other forms of classification or preprocessing are needed.
The files are stored in the so-called "binary matrix" file format, which is a simple format for vectors and multidimensional matrices of various element types. Binary matrix files begin with a file header which describes the type and size of the matrix, and then comes the binary image of the matrix.
The header is best described by a C structure: struct header { int magic; // 4 bytes int ndim; // 4 bytes, little endian int dim3; };
(Note that when the matrix has less than 3 dimensions, say, it's a 1D vector, then dim1 and dim2 are both 1. When the matrix has more than 3 dimensions, the header will be followed by further dimension size information. Otherwise, after the file header comes the matrix data, which is stored with the index in the last dimension changes the fastest.)
Since the files are generated on an Intel machine, they use the little-endian scheme to encode the 4-byte integers. Pay attention when you read the files on machines that use big-endian.
The "-dat" files store a 4D tensor of dimensions 24300x2x96x96. Each files has 24,300 image pairs, (obviously, each pair has 2 images), and each image is 96x96 pixels.
The "-cat" files store a 2D vector of dimension 24,300x1. The "-info" files store a 2D matrix of dimensions 24300x4.
You can find a piece of Matlab code to show how to read an example file at the end of the official website here
The Dataset was created by: Fu Jie Huang, Yann LeCun
Y. LeCun, F.J. Huang, L. Bottou, Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) 2004
Courant Institute, New York University October, 2005
The dataset as well as some of this overview was taken from : [The official site][4]
TERMS / COPYRIGHT
This database is provided for research purposes. It cannot be sold. Publications that include results obtained with this database should reference the following paper:
Y. LeCun, F.J. Huang, L. Bottou, Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) 2004
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.
Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThe State Emergency Department Databases (SEDD) are part of the family of databases and software tools developed for the Healthcare Cost and Utilization Project (HCUP). The SEDD are a set of databases that capture discharge information on all emergency department visits that do not result in an admission. The SEDD combined with SID discharges that originate in the emergency department are well suited for research and policy questions that require complete enumeration of hospital-based emergency departments within market areas or states. Data may not be available for all states across all years.