MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
LMDiskANN.jl (v1.2.0) is a Julia package that implements the Low-Memory Disk Approximate Nearest-Neighbor (LM-DiskANN) algorithm, extending DiskANN-style graph search to handle billion-scale vector datasets while keeping RAM usage to a minimum. It stores adjacency lists on disk via memory-mapped files, performs tunable best-first graph traversals for fast and accurate queries, and supports dynamic insertions and deletions with automatic pruning to maintain a compact index. The library exposes knobs to balance recall against latency, and it optionally pairs a LevelDB key–value store with the node IDs for flexible external key lookup. These capabilities make LMDiskANN.jl well-suited for embedding retrieval, recommendation systems, and other large-scale similarity-search workloads that need high throughput on commodity hardware.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Nearest Neighbors Search Datasets
The datasets listed below are used in Hann library.
Index Dataset Dimensions Train Size Test Size Neighbors Distance Original Source
1 GloVe (25d) 25 1,183,514 10,000 100 Cosine HDF5 (121MB)
2 GloVe (50d) 50 1,183,514 10,000 100 CosineHDF5 (235MB)
3 GloVe (100d) 100 1,183,514 10,000 100 Cosine HDF5 (463MB)
4 GloVe (200d) 200 1,183,514 10,000 100 Cosine HDF5 (918MB)
5 Last.fm 65 292,385 50,000 100 Cosine HDF5 (135MB)
6 MNIST 784… See the full description on the dataset page: https://huggingface.co/datasets/habedi/nearest-neighbors-datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository includes all the experimental code associated with all required data for the paper ``X. Chen and S. Güttel. Fast and exact fixed-radius neighbor search based on sorting, 2023.''
Search Nearby API provides HTTP-based API for application developers to find the facilities located within 1 km of the search location.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Summary
This dataset contains 200-dimensional vectors for 1M images indexed by Yandex and produced by the Se-ResNext-101 model.
Usage
git lfs install git clone https://huggingface.co/datasets/unum-cloud/ann-t2i-1m
Dataset Structure
The dataset contains three matrices:
base: base.1M.fbin with 1M vectors to construct the index. query: query.public.100K.fbin with 100K vectors to lookup in the index. truth: groundtruth.public.100K.ibin with… See the full description on the dataset page: https://huggingface.co/datasets/unum-cloud/ann-t2i-1m.
Nearby guides your app viewers to places of interest in your map based on an address location they search for or their current location. Search for places of interest using a search radius or the map extent. When using the search radius, set a range for the distance slider that app viewers will user to define their search buffer or pan the map to see results when showing results on the map. Include directions to help viewers navigate to locations. Enable the export tool to allow viewers to capture images of the map along with results from the search.Examples:Create a store locator app where a customer inputs a location and can find the closest or nearby stores and navigate to itBuild an app where the users can find healthcare facilities within a specified distance of a searched addressProvide viewers with directions and information for election polling locationsBuild an app where users can find nearby trails and view an elevation profile of each resultData RequirementsThis application requires a feature layer to take full advantage of its capabilities. For more information, see the Layers help topic for more details.Key App CapabilitiesDistance slider - Set a minimum and maximum search radius in which results will be capturedMap extent result - Show all the results in the map viewSearch results - Provide location information with feature attributes from a configured pop-upInclude related records – Included related records to be returned in the resultsResults focused layout - Keep the map out of the app to maintain focus on the search and resultsFilter options - Configure predefined options that allow viewers to filter data in the mapExport - Capture an image of the map to export and choose to include search resultsDirections - Include the option to provide directions from a searched location to a resultElevation profile - Include an option to view the elevation profile of linesExport – Print the results and map to a PDF or export results to csvLanguage switcher - Publish a multilingual app that combines your translated custom text and the UI translations for supported languagesHome, Zoom Controls, Legend, Layer List, SearchSupportabilityThis web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.
Use the Nearby template to guides your app users to places of interest close to an address. This template helps users find focused types of locations (such as schools) within a search distance of an address, their current location, or other place they specify. They can adjust distance values to change the search radius and get directions to locations they select. For users who are searching, you can set a range for the distance slider so users can define their search buffer or pan the map to see results from the map view. Include directions to help users navigate to locations within a defined search radius. Include the export tool to allow users to capture images of the map along with results from the search. Examples: Create a store locator app that allows customers to input a location, find a nearby store, and navigate to it. Create an app for finding health care facilities within a specified distance of a searched address. Provide users with directions and information for election polling locations. Build an app where users can find nearby trails and view an elevation profile of each result. Data requirements The Nearby template requires a feature layer to take full advantage of its capabilities. Key app capabilities Distance slider - Set a minimum and maximum search radius for finding results. Map extent result - Show all the results in the map view. Panel options - Customize result panel location information with feature attributes from a configured pop-up. Results-focused layout - Keep the map out of the app to maintain focus on the search and results. Attribute filter - Configure map filter options that are available to app users. Export - Print or export the search results or selected features as a .pdf, .jpg, or .png file that includes the pop-up content of returned features and an option to include the map. Alternatively, download the search results as a .csv file. Directions - Provide directions from a searched location to a result location. Elevation profile - Generate an elevation profile graph across an input line feature that can be selected in the scene or from drawing a single or multisegment line using the tool. Language switcher - Provide translations for custom text and create a multilingual app. Home, Zoom controls, Legend, Layer List, Search Supportability This web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of the proposed method against existing techniques on Diab dataset.
Pre-trained embeddings for approximate nearest neighbor search using the cosine distance. This dataset consists of two splits:
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('deep1b', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://data.gov.tw/licensehttps://data.gov.tw/license
This dataset mainly provides the Ministry of Culture's integration of its own and its subordinate institutions, as well as latitude and longitude queries of activities from other public and private units in the vicinity that have not expired.
The Find Health Center tool is a locator tool designed to make data and information concerning Federally-Funded Health Centers more readily available to our users. It is intended to help people in greatest need for health care locate where they could obtain care in their particular location. The user is able to search for health centers nearest to a specific complete address, city and state, state and county, or ZIP code. The search results (health centers) are returned in groups of ten (numbered from one to ten) and are sorted by increasing distance away from the center of the search area (address or county). For each health center entry in the list the user is provided the health center name, address, approximate distance from the center point of the search, telephone number, website address (where available), and a link for driving directions. The user has the option of viewing the search results either on a map or as text (default) and both views provide links to get more detailed information for each returned opportunity.
Dataset Card for minishlab/my-vicinity-repo
This dataset was created using the vicinity library, a lightweight nearest neighbors library with flexible backends. It contains a vector space with 5 items.
Usage
You can load this dataset using the following code: from vicinity import Vicinity vicinity = Vicinity.load_from_hub("minishlab/my-vicinity-repo")
After loading the dataset, you can use the vicinity.query method to find the nearest neighbors to a vector.… See the full description on the dataset page: https://huggingface.co/datasets/minishlab/my-vicinity-repo.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This layer contains the data for the survey control points for the City of Round Rock, located in Williamson County, Texas. This layer is part of an original dataset provided and maintained by the City of Round Rock GIS/IT Department. The data in this layer are represented as points.This layer can be used to locate the nearest monument(s) to your site’s location. Find the control point nearest your area to determine the corresponding data sheet, and find the download link below. You can also download the monument coordinates and report synopsis.GPS Point Data Sheets:01-001 01-002 01-003 01-00401-005 01-006 01-007 01-00801-009 01-010 01-011 01-01201-013 01-014 01-015 01-01601-017 01-018 01-019 01-02001-021 01-022 01-023 01-02401-025 01-026 01-027 01-02801-029 01-030 01-031 01-03201-033 01-034 01-035 01-03601-037 01-038 01-039 01-04001-041
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of the proposed method against existing techniques on Hepatitis dataset.
MetroCard Vendor Location FinderMetroCards can be purchased at many locations throughout Westchester including the County Center, Metro-North train stations, and over 100 neighborhood stores. This map is designed to help you find a vendor near you. To find the nearest MetroCard Vendor Type in an address in Search an address Box To Clear Search Click on the X Use the Zoom-in tool to see additional features and Zoom-out to see less features To Zoom to Full Extent Click on Home Button Please note: Not all types of MetroCard are available at every sales location. See below for additional ways to purchase a MetroCard and how to become a vendor.Retail MerchantsMerchants can sell both pre-valued MetroCard (ranging in price from $5.50 to $61.90 with bonus) and Unlimited Ride MetroCard (7-Day or 30-Day). This map is designed to help you find retail merchants within Westchester County. For a complete list of merchants within New York City, Long Island, and New Jersey visit the MTA’s website or call 718-330-1234. MetroCard VanThere is a full-service MetroCard van that visits Westchester County every month. For more details including dates and locations of the van please click here. Riders are able to buy a regular MetroCard, refill their existing MetroCards, and apply for a Reduced-Fare MetroCard if they are 65 and older or have qualifying disabilities.Metro-North Railroad StationsYou can buy a joint rail/MetroCard or a separate $25 MetroCard from any Metro-North ticket machine or ticket office. Machines accept cash, credit cards and ATM/debit cards - a $1 fee is assessed on these purchases. Other joint rail/MetroCard options are also available through Mail and Ride, Metro-North's monthly ticket-by-mail program.Subway StationsMetroCard can be purchased from vending machines or staffed sales booths in New York City subway stations. Machines accept cash, credit cards and ATM/debit cards. Station booth agents accept cash only.EasyPayEasyPay is for both full-fare and reduced-fare customers who want to enjoy the benefits of a MetroCard that never runs out of rides. The EasyPay MetroCard is linked to your credit or debit card, and refills automatically as you use it.Become a VendorSelling MetroCard brings in customers and commissions. Merchants can earn up to 3% on every card sold. Click here to learn more and complete the vendor application process. Free advertising materials are provided to merchants.
Open modification searching (OMS) is a powerful search strategy that identifies peptides carrying any type of modification by allowing a modified spectrum to match against its unmodified variant by using a very wide precursor mass window. A drawback of this strategy, however, is that it leads to a large increase in search time. Although performing an open search can be done using existing spectral library search engines by simply setting a wide precursor mass window, none of these tools have been optimized for OMS, leading to excessive runtimes and suboptimal identification results. This data set contains the evaluation results of the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. This approach is combined with a cascade search strategy to maximize the number of identified unmodified and modified spectra while strictly controlling the false discovery rate, as well as a shifted dot product score to sensitively match modified spectra to their unmodified counterparts. ANN-SoLo achieves state-of-the-art performance in terms of speed and the number of identifications. On a previously published human cell line data set, ANN-SoLo confidently identifies more spectra than SpectraST or MSFragger and achieves a speedup of an order of magnitude compared to SpectraST.
https://www.propertygenie.us/terms-conditionshttps://www.propertygenie.us/terms-conditions
The LTR Genie Score of Columbus, GA is 66, indicating a moderate level of rentability for long-term rental properties in the area. The STR Genie Score is 86, showing a high level of rentability for short-term rental or Airbnb properties. The higher STR Genie Score can be attributed to the strong net ROI of 63.04% and high occupancy rate of 68.97, which are both significantly higher than the metrics for long-term rentals. Additionally, the 1-Year Price Appreciation Forecast of 0.13% suggests a stable market with potential for growth.Columbus, GA is a city located in western Georgia, known for its diverse economy and strong military presence due to the nearby Fort Benning. The city offers a mix of urban amenities and outdoor recreational opportunities, making it an attractive location for both residents and visitors.Based on the metrics provided, Columbus, GA appears to be more attractive for short-term rental investments due to the higher STR Genie Score and stronger net ROI. Investors looking for higher returns and a potentially more stable market may find success in the short-term rental market in this area. However, long-term rental investments may still be viable for those seeking a more traditional real estate investment approach. It is recommended for real estate investors to carefully evaluate their investment goals and risk tolerance before deciding on the best strategy for Columbus, GA.
Go to https://afdc.energy.gov/stations/#/find/nearest to access the full database of alternative fuel station locations nationwide, collected and maintained by the U.S. Department of Energy National Renewable Energy Laboratory. A station appears as one point in the data and on the map, regardless of the number of fuel dispensers or charging outlets at that _location. For EV charging stations for example, the data includes the number of number of charging ports available at the specific station. How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov.
Our aim is to improve the accuracy of the meteorological model with Machine Learning. To do so we need a database that contains input variables (meteorological model results) and output data (actual data from a meteorological station). Dependent variables are outputs of the meteorological model. Independent variables are measured by the meteorological station. The trained Machine Learning algorithm will take the variables from the meteorological model and forecast a meteorological variable. The meteorological model is a WRF model maintained by Meteogalicia, a public meteorological service from Galicia (Spain). The model has a resolution of 4 Km. We get the nearest points outputs provided by the model from the station. This dataset is focused on meteorological stations at Ria Arousa (Spain). The meteorological stations are: Coron at latitude: 42.5801 N and longitude: 8.80471 W. and Cortegada at latitude: 42.626 N and longitude: 8.784 W.
The dataset contains:
1._Files (.csv) with the meteorological model: Format LatXX.XX-lonXX.XXp4R4KmD0.csv when lat. and lon. represent latitude and longitude of the meteorological station. p is the number of nearest points from the station (4 points in this case). R is the spatial resolution of the model (4 Km in this case). D means the Day forecast. D0 represents hours H+1 to H+24 from time analysis (we use 00Z analysis of WRF Meteogalicia model). D1 represents hours H+25 H+48 and so on. Each meteorological variable ends with a numerical suffix representing the point. The nearest point is "po" and the farthest point would be: "p3". Columns are meteorological variables forecasted and column time (every hour):
lhflx: Surface downward latent heat flux. Units, watts per square meters.
dir: Predicted wind direction at 10 meters. From North direction clockwise. Units are degrees. Unlike dir_o no variable wind is forecasted (no -1 values)
mod: Wind intensity forecasted at 10 meters. Units are meters per second.
prec: Total accumulated rainfall between each model output. In our case, every hour. Units kilograms per meter squared.
rh: Relative Humidity. Units fraction
visibility: Visibility in air. Units meters. Minimum visibility 26.028316 meters. Maximum visibility 24235.000000
wind_gust: Wind gust at 10 meters. Units are meters per second. Unlike wind gust_o always forecasted (no -1 value)
mslp: Sea Level Pressure in pascals
temp: Air Temperature in Kelvin at 2 meters
cape: Convective available potential energy. Units: Jules per kilogram. Check this link for more information
cin: Convective inhibition. Click here for more information. Units Jules per Kilogram
cfl: Cloud area fraction at low atmosphere layer. I found 1251 samples with values higher than 1 !! Perhaps, we wouldn’t trust this feature so much.
cfm: Cloud area fraction at mid atmosphere layer. Also, I found 37 samples with values higher than 1.
conv_prec: Total accumulated convective rainfall between each model output. Every hour in our case.
HGT500: Geopotential height at 500mb. Units m
HGT850: Geopotential height at 850mb. Units m
T500: Temperature at 500mb. Units Kelvin
T850: Temperature at 850mb. Units Kelvin
cfh: Cloud cover at high levels. Units fraction
cft: Cloud cover at low and mid-levels. Units fraction
lwflx: Surface downward latent heat flux. Units: W m-2
2._Files with format: stationname.csv: Contain the actual meteorological variables mesured every 10 or 60 minutes. Variables are:
dir_o: wind direction (degrees) gust_direction_o: gust direction (degees) gust_speed_o: gust speed (m/s) spd_o: speed (m/s) std_dir_o: standard deviation direction (degrees) std_spd_o: standard deviation speed (m/s) gust_spd_max_hour_before_o: max gust speed an hour before (m/s) prec_o: precipitation every 10 minutes (mm) prec_accumulated_1_hour_before: precipitation accumulated one hour before (mm)
3._ Files with format: metvar_stationname_pxRxKDX.al: Contain the algorthm (independent variables, scaler, PCA, and quality stadisticcs about the algorithn itself and the meteorological model). metvar is the variable forecasted. pX number of the 4 nearest points . RXKm model resolution (4 Km in our case). D forecast day. These files are required by the notebook (operational_arousa) to get the daily results.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
The Alternative Fueling Stations dataset is updated daily from the National Renewable Energy Laboratory (NREL) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). For more information about the update cycle and data collection methods, please refer to https://afdc.energy.gov/stations/#/find/nearest?show_about=true. This dataset shows all station access types (public and private) and statuses (available, planned, and temporarily unavailable) by default. To view only publicly available stations, use the access and status filters. The U.S. Department of Energy collects these data in partnership with Clean Cities coalitions and their stakeholders to help fleets and consumers find alternative fueling stations. Clean Cities coalitions foster the nation's economic, environmental, and energy security by working locally to advance affordable, efficient, and clean transportation fuels and technologies. This data can be found on the Alternative Fuels Data Center: https://doi.org/10.21949/1519144. For more information about the data schema and data dictionary, please see https://developer.nrel.gov/docs/transportation/alt-fuel-stations-v1/all/#response-fields. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529008
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
LMDiskANN.jl (v1.2.0) is a Julia package that implements the Low-Memory Disk Approximate Nearest-Neighbor (LM-DiskANN) algorithm, extending DiskANN-style graph search to handle billion-scale vector datasets while keeping RAM usage to a minimum. It stores adjacency lists on disk via memory-mapped files, performs tunable best-first graph traversals for fast and accurate queries, and supports dynamic insertions and deletions with automatic pruning to maintain a compact index. The library exposes knobs to balance recall against latency, and it optionally pairs a LevelDB key–value store with the node IDs for flexible external key lookup. These capabilities make LMDiskANN.jl well-suited for embedding retrieval, recommendation systems, and other large-scale similarity-search workloads that need high throughput on commodity hardware.