Facebook
Twitterdvs/90sclub-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Numerous studies on medicines are conducted day by day. To address shortcomings of medicines information generation, prediction, and classification models, the authors introduce a large medicines information dataset of textual data. For this motivation, the authors named our dataset ‘MID’.
• Value of the data - MID is the largest, to our knowledge, available and representative Medicines Information Dataset (MID) for a wide variety of drugs. It includes the names of over 192k medicines, making it a comprehensive collection of pharmaceutical products. - MID is the largest, making it robust for generating information about drugs such as indications or interactions. - MID offers over 192k rows distributed in 44 variety therapeutic classes, making it robust for drug classification to therapeutic label. - MID provides accurate, authoritative, and trustworthy information on medicines for enhancing predictions and efficiencies in clinical trial management. - MID includes details such as drug names, information URL, salt composition, drug introduction, therapeutic uses, side effects, drug benefits, how to use of drug, how to use of drug, how drug works, quick tips of drug, safety advice of drug, chemical class of drug, habit forming of drug, therapeutic class of drug, and action class of drug. This dataset aims to provide a useful resource for medical researchers, healthcare professionals, drug manufacturers, data scientists, and enthusiasts interested in exploring the world of medicines and healthcare products. - In contrast with the few small available datasets, MID's size makes it a suitable corpus for implementing both classical as well as deep learning models.
• MID.xlsx provides the raw data, including medicine information. The data collected to ensure an acceleration and save experimental efforts for medicines through help in predicting or generating or classifying of medicine information preclinically.
• Therapeutic_class_counts.xlsx is summarize distribution of medicines per therapeutic class.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Journey9ni/VLM-3R-DATA dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LUMID is a large-scale, unlabeled collection of over 2 million medical images spanning multiple imaging modalities, including CT scans, X-rays, MRIs, and more. This dataset has been meticulously curated from publicly available medical imaging repositories, addressing the critical challenge of limited scale in existing public datasets and the inaccessibility of high-quality private datasets. The primary motivation behind creating this dataset is to empower the medical imaging community with a resource suited for developing and training advanced deep learning models. By enabling the use of unsupervised and self-supervised learning approaches, this dataset facilitates the learning of rich, transferable representations that can significantly enhance performance across various medical imaging tasks, including classification, segmentation, and anomaly detection.
Key Features: 1) Diversity: Comprising images from multiple modalities and a wide range of medical imaging scenarios. 2) Scalability: A dataset of unprecedented size, providing a robust foundation for training deep neural networks. 3) Versatility: Specifically designed for unsupervised and self-supervised learning methods, fostering innovation in representation learning for medical imaging. 4) Open Access: Built entirely from public datasets, ensuring transparency and reproducibility.
This dataset is intended to serve as a cornerstone for advancing research in medical AI, fostering the development of models capable of generalizing across diverse imaging types and clinical conditions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Brookfield population by age. The dataset can be utilized to understand the age distribution and demographics of Brookfield.
The dataset constitues the following three datasets
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Imperial by race. It includes the population of Imperial across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Imperial across relevant racial categories.
Key observations
The percent distribution of Imperial population by race (across all racial categories recognized by the U.S. Census Bureau): 37.29% are white, 1.03% are Black or African American, 0.71% are American Indian and Alaska Native, 3.90% are Asian, 27.85% are some other race and 29.23% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Imperial Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Algorithmic trading space is buzzing with new strategies. Companies have spent billions in infrastructure and R&D to be able to jump ahead of the competition and beat the market. Finding value in stocks is an art that very few mastered. Can a computer do that?
This dataset contains 200+ financial indicators that are commonly found in the 10-K filings each publicly traded company releases yearly, for a period of US stocks for 2018.
## Target Variables The dataset includes two class labels: 1. PRICE VAR [%]: Lists the percent price variation for 2018 2. class: Binary classification for each stock where: - 1: Identifies stocks that an hypothetical trader should BUY - 0: Identifies stocks that an hypothetical trader should NOT BUY
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Alzheimer's Disease Multiclass Dataset contains approximately 44,000 MRI images categorized into four distinct classes based on the severity of Alzheimer's disease. This dataset is intended for use in machine learning model training and testing. All images are skull-stripped and clean of non-brain tissue.
Dataset Structure The dataset is organized into the following four directories, each representing a different class of disease severity: NonDemented: Contains 12,800 MRI images of subjects with no signs of dementia. VeryMildDemented: Contains 11,200 MRI images of subjects with very mild symptoms of dementia. MildDemented: Contains 10,000 MRI images of subjects with mild dementia. ModerateDemented: Contains 10,000 MRI images of subjects with moderate dementia.
Image Details Total Number of Images: 44,000 Image Format: MRI scans as .JPG files Image Usage: Suitable for training and testing machine learning models focused on classifying Alzheimer's disease stages.
Disease Severity Classification The dataset follows a severity ranking system for Alzheimer's disease: NonDemented: No dementia. Very Mild Demented: Early signs of dementia, very mild symptoms. Mild Demented: Clear signs of dementia, but still mild. Moderate Demented: More pronounced symptoms of dementia, moderate severity.
This dataset is an augmented and upsampled version of the dataset below: https://www.kaggle.com/datasets/uraninjo/augmented-alzheimer-mri-dataset-v2
This dataset was upsampled as the original dataset had a large class imbalance.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Automatic License Plate Recognition (ALPR) System: Use the "License Plates" model to develop an ALPR system for traffic management, toll collection, and parking access control, making these processes more efficient and accurate.
Stolen Vehicle Tracking and Recovery: Integrate the "License Plates" model into security and surveillance systems to identify and track stolen vehicles in real-time, helping law enforcement to locate and recover them more efficiently.
Traffic Violation Detection: Combine the model with other computer vision and sensor technologies to detect traffic violations, such as speeding, illegal parking, or running red lights, and automatically generate citations based on license plate identification.
Vehicle Data Collection and Analytics: Use the "License Plates" model for data collection and analytics on traffic patterns, vehicle types, and license plate distribution in specific areas. This information can be used to optimize urban planning, infrastructure development, and transportation policies.
Enhanced Augmented Reality Navigation: Implement the "License Plates" model in augmented reality applications for drivers, allowing them to receive information about nearby vehicles, such as make and model, or routing assistance based on license plate detection and computations.
Facebook
TwitterThe dataset represents a compilation of user interaction data generated by users who participated in the project's pilot activities in Patras, Greece. Data was generated by users in the SMARTBUY app and includes information about users, stores, product categories, professions, and events.
The dataset comprises the following data: - users: user account data for the Patras pilot users - occupation: all possible occupations that the pilot users could choose from - stores: stores which participated in the Patras pilot - sel_products_cat: products uploaded to the SMARTBUY platform by retailers - events: geo-stamped and time-stamped descriptions of a user interaction event (for instance, "user_id 67 rated product_id 722 with rating 4 at location x1 at datetime y1", or "user_id 91 denoted product_id 78 as favorite at location x2 at datetime y2") - event_types: all possible event types captured by the SMARTBUY platform ('Product searches', 'Product views', 'Featured product', 'Products near you views', 'Product photos browsed', 'Product ratings', 'Clicks on Read More button to read product reviews', 'Clicks on Open map button', 'Clicks on Send this info by email button', 'Products denoted as Favorite')
Privacy-sensitive information such as user names, retailer owner names and store names and keywords searched are anonymized.
Facebook
TwitterThis repo consists of the datasets used for the TaCo paper. There are four datasets:
Multilingual Alpaca-52K GPT-4 dataset Multilingual Dolly-15K GPT-4 dataset TaCo dataset Multilingual Vicuna Benchmark dataset
We translated the first three datasets using Google Cloud Translation. The TaCo dataset is created by using the TaCo approach as described in our paper, combining the Alpaca-52K and Dolly-15K datasets. If you would like to create the TaCo dataset for a specific language, you can… See the full description on the dataset page: https://huggingface.co/datasets/saillab/taco-datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Windsor V1 is a dataset for object detection tasks - it contains Road Defects annotations for 211 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Atticus Open Contract Dataset (AOK)(beta) is a corpus of 5,000+ labels in 200 commercial legal contracts that have been manually labeled by legal experts to identify 40 types of clauses that are important during contract review in connection with corporate transactions, such as mergers and acquisitions, IPO, and corporate financing.
AOK Dataset is curated and maintained by The Atticus Project, Inc., a non-profit organization, to support NLP research and development in legal contract review.
If you download this dataset, we'd love to know more about you and your project! Please fill out this short form: https://forms.gle/h47GUENTTbBqH39m7.
Check out our website at atticusprojectai.org.
Update: The expanded 1.0 version of the dataset is available here https://zenodo.org/record/4595826
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset was constructed from the test set split of the VoxCeleb 2 dataset (VoxCeleb). The VoxCeleb 2 test set contains 118 speakers each in several different videos. To develop this dataset, only one video per speaker was selected. A face image was also extracted from the video, as well as, a low resolution face image (8x8). Age, gender and ethnicity of the person in the face image were determined using the “DeepFace” library, a face recognition and facial attribute analysis library.
This dataset can be used to evaluate speech2face, speech conditioned face generation and speech conditioned face super-resolution systems.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.
Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.
Facebook
TwitterIn diesem Datensatz sind alle ( 15) Klimastationen des gewählten Bundeslandes abgelegt. Je Station 10 Realisierungen. 150 Dateien mit je 4'241'439 Byte. Datensatz ist zip-gepackt.
Daten: (ASCII) Datasatz Kürzel : WR2010_EH5_1_A1B_MV_KL Datasatz Name : UBA-WETTREG ECHAM5/OM 20C + A1B Lauf 1 1961-2100 für das gewählte Bundesland, Klimastationen
Dateistruktur Klimastation: (Kopfzeilen) Stationsname Breite Länge Höhe Typ ta.mo.jahr TX TM TN RR RF PP DD SD NN FF
Stationslist: Stationsliste_MV_KL.txt Stationsnummer, Stationsname, Bundeslandkürzel, Breite, Länge, Stationshöhe,Typ
Es gibt keine Jahre mit Schalttag. Die Ausfallkennung ist -999.0
This data set is a pool of all ( 15) climate stations of the selected Federal State, specified in the entry_name. 10 realizations per station . 150 files with 4'292'439 Byte. Dataset is zip-compressed.
Data: (ASCII) Dataset acronym: WR2010_EH5_1_A1B_MV_KL Dataset name: UBA-WETTREG ECHAM5/OM 20C + A1B Run 1 realization 1961-2100 for the selected Federal State - climate stations
File structure climate stations: (header) station name Latitude Longitude height type ta.mo.jahr TX TM TN RR RF PP DD SD NN FF
Station list: Stationsliste_MV_KL.txt station number, name of station, Abbreviation of federal state, latitude, longitude, height over sea level,type
There are no leap years. Missing values are indicated with -999.0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data in support of the article entitled Experiential modulation of social dominance in a SYNGAP1 rat model of ASD in the European Journal of Neuroscience Advances in our understanding of developmental brain disorders such as autism spectrum disorders (ASD) are being achieved through human neurogenetics in, for example, identifying de novo mutations in SYNGAP1 as one relatively common cause of ASD. A recently developed rat line lacking the calcium/lipid binding (C2) and GTPase activation protein (GAP) domain may further help understanding the neurobiology of deficits seen in children with ASD. This study focused on social dominance in the tube test using Syngap+/D-GAP (rats heterozygous for the ) as alterations in social behaviour are a key facet of the human phenotype. Male animals of this line living together formed a stable intra- cage hierarchy but when living with WT cage-mates, they were submissive, modelling the social withdrawal seen in ASD, with detailed analysis of the specific behaviours shown in social interactions by dominant and submissive animals. A further suggestive observation was that when the Syngap+/D-GAP mutants that had been living together had dominance encounters with WT animals from other cages, the two higher ranking Syngap+/D-GAP rats were dominant whereas the two lower ranking mutants showed the opposite pattern of being submissive. These findings confirm earlier observations with a rat model of Fragile-X indicating that although genotype may be a major determinant of intra-cage hierarchies, the experience of winning or losing can have an influence on subsequent encounters with others. Our results highlight and model that even with single-gene mutations, dominance phenotypes reflect an interaction between genotypic and environmental factors.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Dublin by race. It includes the population of Dublin across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Dublin across relevant racial categories.
Key observations
The percent distribution of Dublin population by race (across all racial categories recognized by the U.S. Census Bureau): 87.46% are white, 0.28% are Black or African American, 0.43% are Asian, 5.54% are some other race and 6.29% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Dublin Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterPotential Evapotranspiration (PET) is the amount of evaporation which would occur if there was an unlimited supply of water. This dataset represents Potential Evapotranspiration for well watered grass and is available either with an Interception element (PETI) or without one (PET). The dataset has been calculated from homogenised climate station data, gridded to a 1km grid. The dataset starts in 1961, is available on a water day (09:00 to 09:00 on the following day) timestep and is updated frequently. Attribution statement: © Environment Agency copyright and/or database right 2024. All rights reserved.
Facebook
TwitterAll states (including the District of Columbia) are required to provide data to The Centers for Medicare & Medicaid Services (CMS) on a range of Medicaid and Children’s Health Insurance Program (CHIP) indicators related to key application, eligibility, enrollment and call center processes. These data reflect enrollment activity for all populations receiving comprehensive Medicaid and CHIP benefits in all states, as well as state program performance. States submit this data via the Performance Indicator dataset. Further information about this dataset is available at: https://www.medicaid.gov/medicaid/national-medicaid-chip-program-information/medicaid-chip-enrollment-data/performance-indicator-technical-assistance/index.html.
Facebook
Twitterdvs/90sclub-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community