Dataset Overview
dataset: glove-100-angular
Metadata
Creation Time: 2025-01-07 11:21:16+0000 Update Time: 2025-01-07 11:21:31+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/glove-100-angular.
Dataset Overview
dataset: glove-50-angular
Metadata
Creation Time: 2025-01-07 11:13:35+0000 Update Time: 2025-01-07 11:13:46+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective licenses… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/glove-50-angular.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
A necessary component of understanding vector-borne disease risk is the accurate characterization of the distributions of their vectors. Species distribution models have been successfully applied to data-rich species but may produce inaccurate results for sparsely-documented vectors. In light of global change, vectors that are currently not well-documented could become increasingly important, requiring tools to predict their distributions. One way to achieve this could be to leverage data on related species to inform the distribution of a sparsely-documented vector based on the assumption that the environmental niches of related species are not independent. Relatedly, there is a natural dependence of the spatial distribution of a disease on the spatial dependence of its vector. Here, we propose to exploit these correlations by fitting a hierarchical model jointly to data on multiple vector species and their associated human diseases to improve distribution models of sparsely-documented species. To demonstrate this approach, we evaluated the ability of twelve models—which differed in their pooling of data from multiple vector species and inclusion of disease data—to improve distribution estimates of sparsely-documented vectors. We assessed our models on two simulated data sets, which allowed us to generalize our results and examine their mechanisms. We found that when the focal species is sparsely documented, incorporating data on related vector species reduces uncertainty and improves accuracy by reducing overfitting. When data on vector species are already incorporated, disease data only marginally improve model performance. However, when data on other vectors are not available, disease data can improve model accuracy and reduce overfitting and uncertainty. We then assessed the approach on empirical data on ticks and tick-borne diseases in Florida and found that incorporating data on other vector species improved model performance. This study illustrates the value of exploiting correlated data via joint modeling to improve distribution models of data-limited species. Methods Vector Data Vector presence data were obtained from VectorMap and iNaturalist. Only iNaturalist data considered “research grade” were included, and we removed duplicates. To obtain absence data, we referenced VectorMap publications and assumed that if a species was not reported at a sampling location, but was included within the study, that the species was absent at that location. To avoid conflating low sampling effort with low vector presence, we based pseudo-absence locations on presence locations from chiggers, fleas, and mites from both databases and the Global Biodiversity Information Facility. We used a 1:1 ratio of presence to absence points, which produces the most accurate predicted distribution for regression techniques (Barbet-Massin et al., 2012). We artificially sparsely sampled one species within our empirical data (A. maculatum) by including 20% of available presence-absence data in our training set and withholding the rest for testing. The artificial sparse sampling allowed for a robust testing data set to evaluate model performances. To ensure spatial independence between our training and testing data, data were split using the blockCV package (Valavi et al., 2018) in R Version 2023.03.0+386 (R Core Team, 2023). To test the limitations of incorporating disease data, we selected a vector species that does not transmit any of the diseases within our model as our focal species. Empirical sample sizes are given in Supp Table 2. Human Disease Data We obtained annual incidence data on three human diseases (anaplasmosis, ehrlichiosis, Lyme disease) from 2011 to 2019 for each county from the Florida Department of Health. We translated this into human disease presence data in a given county in a given year based on whether the annual incidence there was greater than zero. Covariate data We modeled vector distributions as a function of environmental covariates, which have been linked to tick presence: land cover (Randolph, 2000), 30-year average maximum temperature (Ogden et al., 2020), 30-year average precipitation (Ogden et al., 2020), regional Palmer hydrological drought index (Jones and Kitron, 2000), normalized differential vegetation index (Randolph, 2000), and distance to the nearest waterbody (Kahl and Alidousti, 1997). We obtained landcover data from Global Land Cover Characteristics Database (Loveland et al., 2000), 30-year average climate data from WorldClim (Fick and Hijmans, 2017), Palmer Hydrological Drought Index from NOAA (Bushra and Rohli, 2017), and Normalized Difference Vegetation Index data from USGS Landsat (Vermote et al., 2016). Finally, we obtained waterbody data from the World Wildlife Foundation’s Global Lakes and Wetlands database (McGwire and Fisher, 2001). Pathogen circulation was based on Companion Animal Parasitic Council data, which reports the seroprevalence in canines receiving veterinary treatment. To avoid considering imported cases as indicative of endemicity, we considered a threshold of five annual cases to signal transmission. Finally, to account for under-reporting (Madison-Antenucci, et al., 2020), we modeled reporting probability as a function of health insurance coverage and population size. Insurance data were obtained from County Health Rankings (www.countyhealthrankings.org), and population data were obtained from WorldPop (www.worldpop.org). Simulated data Our first simulation simulates data for three well-documented species (A. americanum, A. maculatum, D. variabilis) and a single sparsely-documented species (I. scapularis). “Well-documented” is defined as 500 samples and “sparsely-documented” is defined as 30 samples (Supp Figure 2). Our second simulation simulates all four species as well-documented (Supp Table 3).
The Sotheby's International Realty dataset provides a premium collection of real estate data, ideal for training AI models and enhancing various business operations in the luxury real estate market. Our data is carefully curated and prepared to ensure seamless integration with your AI systems, allowing you to innovate and optimize your business processes with minimal effort. This dataset is versatile and suitable for small boutique agencies, mid-sized firms, and large real estate enterprises.
Key features include:
Custom Delivery Options: Data can be delivered through Rest-API, Websockets, tRPC/gRPC, or other preferred methods, ensuring smooth integration with your AI infrastructure. Vectorized Data: Choose from multiple embedding models (LLama, ChatGPT, etc.) and vector databases (Chroma, FAISS, QdrantVectorStore) for optimal AI model performance and vectorized data processing. Comprehensive Data Coverage: Includes detailed property listings, luxury market trends, customer engagement data, and agent performance metrics, providing a robust foundation for AI-driven analytics. Ease of Integration: Our dataset is designed for easy integration with existing AI systems, providing the flexibility to create AI-driven analytics, notifications, and other business applications with minimal hassle. Additional Services: Beyond data provision, we offer AI agent development and integration services, helping you seamlessly incorporate AI into your business workflows. With this dataset, you can enhance property valuation models, optimize customer engagement strategies, and perform advanced market analysis using AI-driven insights. This dataset is perfect for training AI models that require high-quality, structured data, helping luxury real estate businesses stay competitive in a dynamic market.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison on the Yale face database (results of our proposed algorithm are in bold).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison of the second experiment (results of our proposed algorithm are in bold).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance of our model and the baseline method on the three databases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance measures for neural network tested on DRIVE database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistics of the iHEARu-EAT database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Model performance for SVM using leave-group-out cross validation and comparison to expert delineated and gSSURGO classified ecological sites.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heart failure (HF) is the final stage of the various heart diseases developing. The mortality rates of prognosis HF patients are highly variable, ranging from 5% to 75%. Evaluating the all-cause mortality of HF patients is an important means to avoid death and positively affect the health of patients. But in fact, machine learning models are difficult to gain good results on missing values, high dimensions, and imbalances HF data. Therefore, a deep learning system is proposed. In this system, we propose an indicator vector to indicate whether the value is true or be padded, which fast solves the missing values and helps expand data dimensions. Then, we use a convolutional neural network with different kernel sizes to obtain the features information. And a multi-head self-attention mechanism is applied to gain whole channel information, which is essential for the system to improve performance. Besides, the focal loss function is introduced to deal with the imbalanced problem better. The experimental data of the system are from the public database MIMIC-III, containing valid data for 10311 patients. The proposed system effectively and fast predicts four death types: death within 30 days, death within 180 days, death within 365 days and death after 365 days. Our study uses Deep SHAP to interpret the deep learning model and obtains the top 15 characteristics. These characteristics further confirm the effectiveness and rationality of the system and help provide a better medical service.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The variables having missing value are preprocessed.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Dataset Overview
dataset: glove-100-angular
Metadata
Creation Time: 2025-01-07 11:21:16+0000 Update Time: 2025-01-07 11:21:31+0000 Source: https://github.com/erikbern/ann-benchmarks Task: N/A Train Samples: N/A Test Samples: N/A License: DISCLAIMER AND LICENSE NOTICE:
This dataset is intended for benchmarking and research purposes only. The source data used in this dataset retains its original license and copyright. Users must comply with the respective… See the full description on the dataset page: https://huggingface.co/datasets/open-vdb/glove-100-angular.