Facebook
TwitterHirai-Labs/alpr-vlm-instruct-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Numerous studies on medicines are conducted day by day. To address shortcomings of medicines information generation, prediction, and classification models, the authors introduce a large medicines information dataset of textual data. For this motivation, the authors named our dataset âMIDâ.
âą Value of the data - MID is the largest, to our knowledge, available and representative Medicines Information Dataset (MID) for a wide variety of drugs. It includes the names of over 192k medicines, making it a comprehensive collection of pharmaceutical products. - MID is the largest, making it robust for generating information about drugs such as indications or interactions. - MID offers over 192k rows distributed in 44 variety therapeutic classes, making it robust for drug classification to therapeutic label. - MID provides accurate, authoritative, and trustworthy information on medicines for enhancing predictions and efficiencies in clinical trial management. - MID includes details such as drug names, information URL, salt composition, drug introduction, therapeutic uses, side effects, drug benefits, how to use of drug, how to use of drug, how drug works, quick tips of drug, safety advice of drug, chemical class of drug, habit forming of drug, therapeutic class of drug, and action class of drug. This dataset aims to provide a useful resource for medical researchers, healthcare professionals, drug manufacturers, data scientists, and enthusiasts interested in exploring the world of medicines and healthcare products. - In contrast with the few small available datasets, MID's size makes it a suitable corpus for implementing both classical as well as deep learning models.
âą MID.xlsx provides the raw data, including medicine information. The data collected to ensure an acceleration and save experimental efforts for medicines through help in predicting or generating or classifying of medicine information preclinically.
âą Therapeutic_class_counts.xlsx is summarize distribution of medicines per therapeutic class.
Facebook
Twitterdvs/90sclub-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LUMID is a large-scale, unlabeled collection of over 2 million medical images spanning multiple imaging modalities, including CT scans, X-rays, MRIs, and more. This dataset has been meticulously curated from publicly available medical imaging repositories, addressing the critical challenge of limited scale in existing public datasets and the inaccessibility of high-quality private datasets. The primary motivation behind creating this dataset is to empower the medical imaging community with a resource suited for developing and training advanced deep learning models. By enabling the use of unsupervised and self-supervised learning approaches, this dataset facilitates the learning of rich, transferable representations that can significantly enhance performance across various medical imaging tasks, including classification, segmentation, and anomaly detection.
Key Features: 1) Diversity: Comprising images from multiple modalities and a wide range of medical imaging scenarios. 2) Scalability: A dataset of unprecedented size, providing a robust foundation for training deep neural networks. 3) Versatility: Specifically designed for unsupervised and self-supervised learning methods, fostering innovation in representation learning for medical imaging. 4) Open Access: Built entirely from public datasets, ensuring transparency and reproducibility.
This dataset is intended to serve as a cornerstone for advancing research in medical AI, fostering the development of models capable of generalizing across diverse imaging types and clinical conditions.
Facebook
TwitterThis dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target ChEMBL_ID: CHEMBL3068 (TID: 11184), and it has 112 rows and 1024 features (not including molecule IDs and class feature: molecule_id and pXC50). The features represent FCFP 1024-bit Molecular Fingerprints which were generated from SMILES strings. They were obtained using the Pipeline Pilot program, Dassault SystĂšmes BIOVIA. Generating Fingerprints do not usually require missing value imputation as all bits are generated.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Fort Plain median household income by race. The dataset can be utilized to understand the racial distribution of Fort Plain income.
The dataset will have the following datasets when applicable
Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
Explore our comprehensive data analysis and visual representations for a deeper understanding of Fort Plain median household income by race. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in Clayton. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.
Key observations: Insights from 2023
Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In Clayton, the median income for all workers aged 15 years and older, regardless of work hours, was $43,125 for males and $33,611 for females.
These income figures indicate a substantial gender-based pay disparity, showcasing a gap of approximately 22% between the median incomes of males and females in Clayton. With women, regardless of work hours, earning 78 cents to each dollar earned by men, this income disparity reveals a concerning trend toward wage inequality that demands attention in thetown of Clayton.
- Full-time workers, aged 15 years and older: In Clayton, for all full-time workers aged 15 years and older, the median income was equal at, $49,514 for both males and females. This indicates a gender income balance in Clayton, where both men and women, in full-time year-round roles, earn an equal income.Curiously, across all roles (full-time and others), there was a notable income disparity between the median incomes for women and men. This hints at a considerable reduction in the income gap within full-time roles, potentially indicating progress towards income equality for women in these roles within Clayton.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.
Gender classifications include:
Employment type classifications include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Clayton median household income by race. You can refer the same here
Facebook
TwitterA subset of the LendingClub DataSet obtained from Kaggle: https://www.kaggle.com/wordsforthewise/lending-club
LendingClub is a US peer-to-peer lending company, headquartered in San Francisco, California. It was the first peer-to-peer lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on a secondary market. LendingClub is the world's largest peer-to-peer lending platform.
Facebook
TwitterThe dataset represents a compilation of user interaction data generated by users who participated in the project's pilot activities in Patras, Greece. Data was generated by users in the SMARTBUY app and includes information about users, stores, product categories, professions, and events.
The dataset comprises the following data: - users: user account data for the Patras pilot users - occupation: all possible occupations that the pilot users could choose from - stores: stores which participated in the Patras pilot - sel_products_cat: products uploaded to the SMARTBUY platform by retailers - events: geo-stamped and time-stamped descriptions of a user interaction event (for instance, "user_id 67 rated product_id 722 with rating 4 at location x1 at datetime y1", or "user_id 91 denoted product_id 78 as favorite at location x2 at datetime y2") - event_types: all possible event types captured by the SMARTBUY platform ('Product searches', 'Product views', 'Featured product', 'Products near you views', 'Product photos browsed', 'Product ratings', 'Clicks on Read More button to read product reviews', 'Clicks on Open map button', 'Clicks on Send this info by email button', 'Products denoted as Favorite')
Privacy-sensitive information such as user names, retailer owner names and store names and keywords searched are anonymized.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Automatic License Plate Recognition (ALPR) System: Use the "License Plates" model to develop an ALPR system for traffic management, toll collection, and parking access control, making these processes more efficient and accurate.
Stolen Vehicle Tracking and Recovery: Integrate the "License Plates" model into security and surveillance systems to identify and track stolen vehicles in real-time, helping law enforcement to locate and recover them more efficiently.
Traffic Violation Detection: Combine the model with other computer vision and sensor technologies to detect traffic violations, such as speeding, illegal parking, or running red lights, and automatically generate citations based on license plate identification.
Vehicle Data Collection and Analytics: Use the "License Plates" model for data collection and analytics on traffic patterns, vehicle types, and license plate distribution in specific areas. This information can be used to optimize urban planning, infrastructure development, and transportation policies.
Enhanced Augmented Reality Navigation: Implement the "License Plates" model in augmented reality applications for drivers, allowing them to receive information about nearby vehicles, such as make and model, or routing assistance based on license plate detection and computations.
Facebook
TwitterThis repo consists of the datasets used for the TaCo paper. There are four datasets:
Multilingual Alpaca-52K GPT-4 dataset Multilingual Dolly-15K GPT-4 dataset TaCo dataset Multilingual Vicuna Benchmark dataset
We translated the first three datasets using Google Cloud Translation. The TaCo dataset is created by using the TaCo approach as described in our paper, combining the Alpaca-52K and Dolly-15K datasets. If you would like to create the TaCo dataset for a specific language, you can⊠See the full description on the dataset page: https://huggingface.co/datasets/saillab/taco-datasets.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset, titled "Anabolic Steroids", provides a meticulously curated compilation of nearly 50 steroids. It includes detailed information on their original names, common names, medicinal applications, abuse potential, side effects, historical context, and relative molecular mass (RMM). The dataset aims to serve as a resource for exploring the dual nature of anabolic steroidsâboth their therapeutic benefits and their misuse in sports and bodybuilding.
Anabolic steroids are synthetic derivatives of testosterone that have been used for decades in medicine to treat conditions like anemia, muscle-wasting diseases, and hormone deficiencies. However, they are also widely abused for performance enhancement and aesthetic purposes. This dataset captures a comprehensive view of these compounds, making it valuable for researchers, educators, and data enthusiasts.
While this dataset is relatively small (approx 50 entries), it offers rich opportunities for exploratory analysis and domain-specific insights. Potential applications include:
Exploratory Data Analysis (EDA):
Domain-Specific Insights:
Educational Use:
This dataset has been ethically compiled from publicly available sources such as scientific journals, chemical databases, and educational websites. No proprietary or confidential information has been included. The data was aggregated to ensure accuracy and relevance while respecting intellectual property rights.
The following sources were instrumental in compiling this dataset: 1. PubChem Database â For verifying chemical properties and molecular mass values. 2. Wikipedia â For historical context and general information on anabolic steroids. 3. NIST Chemistry WebBook â For accurate molecular mass values and chemical details. 4. Scientific Journals â Referenced for medicinal uses, side effects documentation, and abuse patterns. 5. DALL·E 3 by OpenAI â Used to generate illustrative images related to anabolic steroids to complement dataset visualizations.
The misuse of anabolic steroids poses significant health risks and ethical concerns. While anabolic steroids have legitimate medical applications, their abuse for performance enhancement or aesthetic purposes can lead to severe physical and psychological side effects. Common adverse effects include liver damage, cardiovascular strain, hormonal imbalances, infertility, aggression, and mental health issues such as depression. Prolonged misuse can also result in irreversible damage to vital organs and an increased risk of life-threatening conditions like heart attacks or strokes. Beyond individual health risks, steroid abuse undermines the integrity of sports and creates unfair advantages in competitive environments. It is crucial to prioritize natural methods of achieving fitness goals and seek professional guidance for any medical conditions requiring treatment.
This dataset is not intended for machine learning due to its small size but serves as an excellent resource for exploratory data analysis (EDA), visualization projects, and domain-specific research into anabolic steroids' pharmacology and societal impact.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
YoloV8Corrosion is a dataset for semantic segmentation tasks - it contains Corrosion annotations for 770 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Windsor V1 is a dataset for object detection tasks - it contains Road Defects annotations for 211 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Image Train Acc2.2 is a dataset for object detection tasks - it contains Objects EAos F4cY annotations for 1,460 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
## Overview
Snakes is a dataset for classification tasks - it contains Snakes annotations for 3,061 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [BY-NC-SA 4.0 license](https://creativecommons.org/licenses/BY-NC-SA 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Road Cracks 2 is a dataset for object detection tasks - it contains Cracks 4dM9 annotations for 543 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Train_14 is a dataset for object detection tasks - it contains Car Truck Bike Bus annotations for 300 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data in support of the article entitled Experiential modulation of social dominance in a SYNGAP1 rat model of ASD in the European Journal of Neuroscience Advances in our understanding of developmental brain disorders such as autism spectrum disorders (ASD) are being achieved through human neurogenetics in, for example, identifying de novo mutations in SYNGAP1 as one relatively common cause of ASD. A recently developed rat line lacking the calcium/lipid binding (C2) and GTPase activation protein (GAP) domain may further help understanding the neurobiology of deficits seen in children with ASD. This study focused on social dominance in the tube test using Syngap+/D-GAP (rats heterozygous for the ) as alterations in social behaviour are a key facet of the human phenotype. Male animals of this line living together formed a stable intra- cage hierarchy but when living with WT cage-mates, they were submissive, modelling the social withdrawal seen in ASD, with detailed analysis of the specific behaviours shown in social interactions by dominant and submissive animals. A further suggestive observation was that when the Syngap+/D-GAP mutants that had been living together had dominance encounters with WT animals from other cages, the two higher ranking Syngap+/D-GAP rats were dominant whereas the two lower ranking mutants showed the opposite pattern of being submissive. These findings confirm earlier observations with a rat model of Fragile-X indicating that although genotype may be a major determinant of intra-cage hierarchies, the experience of winning or losing can have an influence on subsequent encounters with others. Our results highlight and model that even with single-gene mutations, dominance phenotypes reflect an interaction between genotypic and environmental factors.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset was constructed from the test set split of the VoxCeleb 2 dataset (VoxCeleb). The VoxCeleb 2 test set contains 118 speakers each in several different videos. To develop this dataset, only one video per speaker was selected. A face image was also extracted from the video, as well as, a low resolution face image (8x8). Age, gender and ethnicity of the person in the face image were determined using the âDeepFaceâ library, a face recognition and facial attribute analysis library.
This dataset can be used to evaluate speech2face, speech conditioned face generation and speech conditioned face super-resolution systems.
Facebook
TwitterHirai-Labs/alpr-vlm-instruct-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community