100+ datasets found

f
Data from: LMDiskANN.jl: An Implementation of the Low Memory Disk...
figshare.com
zip
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander V. Mantzaris (2025). LMDiskANN.jl: An Implementation of the Low Memory Disk Approximate Nearest Neighbors Search Algorithm [Dataset]. http://doi.org/10.6084/m9.figshare.29286668.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29286668.v1
Dataset updated
Jun 10, 2025
Dataset provided by
figshare
Authors
Alexander V. Mantzaris
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
LMDiskANN.jl (v1.2.0) is a Julia package that implements the Low-Memory Disk Approximate Nearest-Neighbor (LM-DiskANN) algorithm, extending DiskANN-style graph search to handle billion-scale vector datasets while keeping RAM usage to a minimum. It stores adjacency lists on disk via memory-mapped files, performs tunable best-first graph traversals for fast and accurate queries, and supports dynamic insertions and deletions with automatic pruning to maintain a compact index. The library exposes knobs to balance recall against latency, and it optionally pairs a LevelDB key–value store with the node IDs for flexible external key lookup. These capabilities make LMDiskANN.jl well-suited for embedding retrieval, recommendation systems, and other large-scale similarity-search workloads that need high throughput on commodity hardware.
h
nearest-neighbors-datasets
huggingface.co
Updated Mar 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hassan Abedi (2025). nearest-neighbors-datasets [Dataset]. https://huggingface.co/datasets/habedi/nearest-neighbors-datasets
Explore at:
Dataset updated
Mar 22, 2025
Authors
Hassan Abedi
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Nearest Neighbors Search Datasets

The datasets listed below are used in Hann library.

Index Dataset Dimensions Train Size Test Size Neighbors Distance Original Source

1 GloVe (25d) 25 1,183,514 10,000 100 Cosine HDF5 (121MB)

2 GloVe (50d) 50 1,183,514 10,000 100 CosineHDF5 (235MB)

3 GloVe (100d) 100 1,183,514 10,000 100 Cosine HDF5 (463MB)

4 GloVe (200d) 200 1,183,514 10,000 100 Cosine HDF5 (918MB)

5 Last.fm 65 292,385 50,000 100 Cosine HDF5 (135MB)

6 MNIST 784… See the full description on the dataset page: https://huggingface.co/datasets/habedi/nearest-neighbors-datasets.
snn_exp
figshare.com
bin
Updated Dec 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Güttel; Xinye Chen (2023). snn_exp [Dataset]. http://doi.org/10.6084/m9.figshare.24781473.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24781473.v1
Dataset updated
Dec 9, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Stefan Güttel; Xinye Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository includes all the experimental code associated with all required data for the paper ``X. Chen and S. Güttel. Fast and exact fixed-radius neighbor search based on sorting, 2023.''
Search Nearby API | DATA.GOV.HK
data.gov.hk
Updated Jul 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.hk (2024). Search Nearby API | DATA.GOV.HK [Dataset]. https://data.gov.hk/en-data/dataset/hk-landsd-openmap-development-search-nearby-api
Explore at:
Dataset updated
Jul 25, 2024
Dataset provided by
data.gov.hk
Description
Search Nearby API provides HTTP-based API for application developers to find the facilities located within 1 km of the search location.
e
ann-t2i-1m
hf-proxy-cf.effarig.site
huggingface.co
Updated Jul 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unum (2024). ann-t2i-1m [Dataset]. https://hf-proxy-cf.effarig.site/datasets/unum-cloud/ann-t2i-1m
Explore at:
Dataset updated
Jul 27, 2024
Dataset authored and provided by
Unum
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Summary

This dataset contains 200-dimensional vectors for 1M images indexed by Yandex and produced by the Se-ResNext-101 model.

Usage

git lfs install git clone https://huggingface.co/datasets/unum-cloud/ann-t2i-1m

Dataset Structure

The dataset contains three matrices:

base: base.1M.fbin with 1M vectors to construct the index. query: query.public.100K.fbin with 100K vectors to lookup in the index. truth: groundtruth.public.100K.ibin with… See the full description on the dataset page: https://huggingface.co/datasets/unum-cloud/ann-t2i-1m.
Nearby
city-of-lawrenceville-arcgis-hub-lville.hub.arcgis.com
Updated Jun 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
esri_en (2020). Nearby [Dataset]. https://city-of-lawrenceville-arcgis-hub-lville.hub.arcgis.com/items/9d3f21cfd9b14589968f7e5be91b52c8
Explore at:
Dataset updated
Jun 30, 2020
Dataset provided by
Esrihttp://esri.com/
Authors
esri_en
Description
Nearby guides your app viewers to places of interest in your map based on an address location they search for or their current location. Search for places of interest using a search radius or the map extent. When using the search radius, set a range for the distance slider that app viewers will user to define their search buffer or pan the map to see results when showing results on the map. Include directions to help viewers navigate to locations. Enable the export tool to allow viewers to capture images of the map along with results from the search.Examples:Create a store locator app where a customer inputs a location and can find the closest or nearby stores and navigate to itBuild an app where the users can find healthcare facilities within a specified distance of a searched addressProvide viewers with directions and information for election polling locationsBuild an app where users can find nearby trails and view an elevation profile of each resultData RequirementsThis application requires a feature layer to take full advantage of its capabilities. For more information, see the Layers help topic for more details.Key App CapabilitiesDistance slider - Set a minimum and maximum search radius in which results will be capturedMap extent result - Show all the results in the map viewSearch results - Provide location information with feature attributes from a configured pop-upInclude related records – Included related records to be returned in the resultsResults focused layout - Keep the map out of the app to maintain focus on the search and resultsFilter options - Configure predefined options that allow viewers to filter data in the mapExport - Capture an image of the map to export and choose to include search resultsDirections - Include the option to provide directions from a searched location to a resultElevation profile - Include an option to view the elevation profile of linesExport – Print the results and map to a PDF or export results to csvLanguage switcher - Publish a multilingual app that combines your translated custom text and the UI translations for supported languagesHome, Zoom Controls, Legend, Layer List, SearchSupportabilityThis web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.
Nearby
anla-esp-esri-co.hub.arcgis.com
noveladata.com
+1more
Updated Jul 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
esri_en (2020). Nearby [Dataset]. https://anla-esp-esri-co.hub.arcgis.com/items/9d3f21cfd9b14589968f7e5be91b52c8
Explore at:
Dataset updated
Jul 1, 2020
Dataset provided by
Esrihttp://esri.com/
Authors
esri_en
Description
Use the Nearby template to guides your app users to places of interest close to an address. This template helps users find focused types of locations (such as schools) within a search distance of an address, their current location, or other place they specify. They can adjust distance values to change the search radius and get directions to locations they select. For users who are searching, you can set a range for the distance slider so users can define their search buffer or pan the map to see results from the map view. Include directions to help users navigate to locations within a defined search radius. Include the export tool to allow users to capture images of the map along with results from the search. Examples: Create a store locator app that allows customers to input a location, find a nearby store, and navigate to it. Create an app for finding health care facilities within a specified distance of a searched address. Provide users with directions and information for election polling locations. Build an app where users can find nearby trails and view an elevation profile of each result. Data requirements The Nearby template requires a feature layer to take full advantage of its capabilities. Key app capabilities Distance slider - Set a minimum and maximum search radius for finding results. Map extent result - Show all the results in the map view. Panel options - Customize result panel location information with feature attributes from a configured pop-up. Results-focused layout - Keep the map out of the app to maintain focus on the search and results. Attribute filter - Configure map filter options that are available to app users. Export - Print or export the search results or selected features as a .pdf, .jpg, or .png file that includes the pop-up content of returned features and an option to include the map. Alternatively, download the search results as a .csv file. Directions - Provide directions from a searched location to a result location. Elevation profile - Generate an elevation profile graph across an input line feature that can be selected in the scene or from drawing a single or multisegment line using the tool. Language switcher - Provide translations for custom text and create a multilingual app. Home, Zoom controls, Legend, Layer List, Search Supportability This web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.
f
Results of the proposed method against existing techniques on Diab dataset.
plos.figshare.com
figshare.com
xls
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sunil Kumar; Sudeep Varshney; Usha Jain; Prashant Johri; Abdulaziz S. Almazyad; Ali Wagdy Mohamed; Mehdi Hosseinzadeh; Mohammad Shokouhifar (2025). Results of the proposed method against existing techniques on Diab dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0322738.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0322738.t003
Dataset updated
May 30, 2025
Dataset provided by
PLOS ONE
Authors
Sunil Kumar; Sudeep Varshney; Usha Jain; Prashant Johri; Abdulaziz S. Almazyad; Ali Wagdy Mohamed; Mehdi Hosseinzadeh; Mohammad Shokouhifar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Results of the proposed method against existing techniques on Diab dataset.
T
deep1b
tensorflow.org
Updated Sep 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). deep1b [Dataset]. https://www.tensorflow.org/datasets/catalog/deep1b
Explore at:
Dataset updated
Sep 3, 2024
Description
Pre-trained embeddings for approximate nearest neighbor search using the cosine distance. This dataset consists of two splits:

'database': consists of 9,990,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (empty list).

'test': consists of 10,000 data points, each has features: 'embedding' (96 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('deep1b', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
d
Latitude and longitude search for nearby unexpired events
data.gov.tw
json
Updated Feb 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministry of Culture (2024). Latitude and longitude search for nearby unexpired events [Dataset]. https://data.gov.tw/en/datasets/10044
Explore at:
jsonAvailable download formats
Dataset updated
Feb 15, 2024
Dataset authored and provided by
Ministry of Culture
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
This dataset mainly provides the Ministry of Culture's integration of its own and its subordinate institutions, as well as latitude and longitude queries of activities from other public and private units in the vicinity that have not expired.
g
Find a Health Center
gimi9.com
catalog.data.gov
Updated Dec 22, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2010). Find a Health Center [Dataset]. https://gimi9.com/dataset/data-gov_find-a-health-center-c0304
Explore at:
Dataset updated
Dec 22, 2010
Description
The Find Health Center tool is a locator tool designed to make data and information concerning Federally-Funded Health Centers more readily available to our users. It is intended to help people in greatest need for health care locate where they could obtain care in their particular location. The user is able to search for health centers nearest to a specific complete address, city and state, state and county, or ZIP code. The search results (health centers) are returned in groups of ten (numbered from one to ten) and are sorted by increasing distance away from the center of the search area (address or county). For each health center entry in the list the user is provided the health center name, address, approximate distance from the center point of the search, telephone number, website address (where available), and a link for driving directions. The user has the option of viewing the search results either on a map or as text (default) and both views provide links to get more detailed information for each returned opportunity.
h
my-vicinity-repo
huggingface.co
Updated Feb 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minish (2025). my-vicinity-repo [Dataset]. https://huggingface.co/datasets/minishlab/my-vicinity-repo
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 28, 2025
Dataset authored and provided by
Minish
Description
Dataset Card for minishlab/my-vicinity-repo

This dataset was created using the vicinity library, a lightweight nearest neighbors library with flexible backends. It contains a vector space with 5 items.

Usage

You can load this dataset using the following code: from vicinity import Vicinity vicinity = Vicinity.load_from_hub("minishlab/my-vicinity-repo")

After loading the dataset, you can use the vicinity.query method to find the nearest neighbors to a vector.… See the full description on the dataset page: https://huggingface.co/datasets/minishlab/my-vicinity-repo.
r
Survey Control Points
geohub.roundrocktexas.gov
Updated May 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Round Rock (2021). Survey Control Points [Dataset]. https://geohub.roundrocktexas.gov/items/d555a98f6a0e407593c23c31b9e75829
Explore at:
Dataset updated
May 19, 2021
Dataset authored and provided by
City of Round Rock
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered

Description
This layer contains the data for the survey control points for the City of Round Rock, located in Williamson County, Texas. This layer is part of an original dataset provided and maintained by the City of Round Rock GIS/IT Department. The data in this layer are represented as points.This layer can be used to locate the nearest monument(s) to your site’s location. Find the control point nearest your area to determine the corresponding data sheet, and find the download link below. You can also download the monument coordinates and report synopsis.GPS Point Data Sheets:01-001 01-002 01-003 01-00401-005 01-006 01-007 01-00801-009 01-010 01-011 01-01201-013 01-014 01-015 01-01601-017 01-018 01-019 01-02001-021 01-022 01-023 01-02401-025 01-026 01-027 01-02801-029 01-030 01-031 01-03201-033 01-034 01-035 01-03601-037 01-038 01-039 01-04001-041
f
Results of the proposed method against existing techniques on Hepatitis...
figshare.com
xls
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sunil Kumar; Sudeep Varshney; Usha Jain; Prashant Johri; Abdulaziz S. Almazyad; Ali Wagdy Mohamed; Mehdi Hosseinzadeh; Mohammad Shokouhifar (2025). Results of the proposed method against existing techniques on Hepatitis dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0322738.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0322738.t005
Dataset updated
May 30, 2025
Dataset provided by
PLOS ONE
Authors
Sunil Kumar; Sudeep Varshney; Usha Jain; Prashant Johri; Abdulaziz S. Almazyad; Ali Wagdy Mohamed; Mehdi Hosseinzadeh; Mohammad Shokouhifar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Results of the proposed method against existing techniques on Hepatitis dataset.
w
MetroCard Vendor Location Finder
gis.westchestergov.com
Updated Jun 9, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Westchester County GIS (2017). MetroCard Vendor Location Finder [Dataset]. https://gis.westchestergov.com/app/metrocard-vendor-location-finder
Explore at:
Dataset updated
Jun 9, 2017
Dataset authored and provided by
Westchester County GIS
Description
MetroCard Vendor Location FinderMetroCards can be purchased at many locations throughout Westchester including the County Center, Metro-North train stations, and over 100 neighborhood stores. This map is designed to help you find a vendor near you. To find the nearest MetroCard Vendor Type in an address in Search an address Box To Clear Search Click on the X Use the Zoom-in tool to see additional features and Zoom-out to see less features To Zoom to Full Extent Click on Home Button Please note: Not all types of MetroCard are available at every sales location. See below for additional ways to purchase a MetroCard and how to become a vendor.Retail MerchantsMerchants can sell both pre-valued MetroCard (ranging in price from $5.50 to $61.90 with bonus) and Unlimited Ride MetroCard (7-Day or 30-Day). This map is designed to help you find retail merchants within Westchester County. For a complete list of merchants within New York City, Long Island, and New Jersey visit the MTA’s website or call 718-330-1234. MetroCard VanThere is a full-service MetroCard van that visits Westchester County every month. For more details including dates and locations of the van please click here. Riders are able to buy a regular MetroCard, refill their existing MetroCards, and apply for a Reduced-Fare MetroCard if they are 65 and older or have qualifying disabilities.Metro-North Railroad StationsYou can buy a joint rail/MetroCard or a separate $25 MetroCard from any Metro-North ticket machine or ticket office. Machines accept cash, credit cards and ATM/debit cards - a $1 fee is assessed on these purchases. Other joint rail/MetroCard options are also available through Mail and Ride, Metro-North's monthly ticket-by-mail program.Subway StationsMetroCard can be purchased from vending machines or staffed sales booths in New York City subway stations. Machines accept cash, credit cards and ATM/debit cards. Station booth agents accept cash only.EasyPayEasyPay is for both full-fare and reduced-fare customers who want to enjoy the benefits of a MetroCard that never runs out of rides. The EasyPay MetroCard is linked to your credit or debit card, and refills automatically as you use it.Become a VendorSelling MetroCard brings in customers and commissions. Merchants can earn up to 3% on every card sold. Click here to learn more and complete the vendor application process. Free advertising materials are provided to merchants.
e
Data from: Fast open modification spectral library searching through...
ebi.ac.uk
omicsdi.org
Updated May 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wout Bittremieux (2021). Fast open modification spectral library searching through approximate nearest neighbor indexing [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD009861
Explore at:
Dataset updated
May 25, 2021
Authors
Wout Bittremieux
Variables measured
Proteomics
Description
Open modification searching (OMS) is a powerful search strategy that identifies peptides carrying any type of modification by allowing a modified spectrum to match against its unmodified variant by using a very wide precursor mass window. A drawback of this strategy, however, is that it leads to a large increase in search time. Although performing an open search can be done using existing spectral library search engines by simply setting a wide precursor mass window, none of these tools have been optimized for OMS, leading to excessive runtimes and suboptimal identification results. This data set contains the evaluation results of the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. This approach is combined with a cascade search strategy to maximize the number of identified unmodified and modified spectra while strictly controlling the false discovery rate, as well as a shifted dot product score to sensitively match modified spectra to their unmodified counterparts. ANN-SoLo achieves state-of-the-art performance in terms of speed and the number of identifications. On a previously published human cell line data set, ANN-SoLo confidently identifies more spectra than SpectraST or MSFragger and achieves a speedup of an order of magnitude compared to SpectraST.
p
Columbus, GA Real Estate Investment Insights
propertygenie.us
Updated Jul 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PropertyGenie (2025). Columbus, GA Real Estate Investment Insights [Dataset]. https://www.propertygenie.us/market-insight/columbus-ga
Explore at:
Dataset updated
Jul 12, 2025
Dataset authored and provided by
PropertyGenie
License
https://www.propertygenie.us/terms-conditionshttps://www.propertygenie.us/terms-conditions
Time period covered
May 31, 2025
Area covered

Variables measured
Population, Rental Count, Job Growth (%), LTR Genie Score, STR Genie Score, Income Growth (%), Rental Demand Score, LTR Monthly Cash Flow, Population Growth (%), STR Monthly Cash Flow, and 6 more
Description
The LTR Genie Score of Columbus, GA is 66, indicating a moderate level of rentability for long-term rental properties in the area. The STR Genie Score is 86, showing a high level of rentability for short-term rental or Airbnb properties. The higher STR Genie Score can be attributed to the strong net ROI of 63.04% and high occupancy rate of 68.97, which are both significantly higher than the metrics for long-term rentals. Additionally, the 1-Year Price Appreciation Forecast of 0.13% suggests a stable market with potential for growth.Columbus, GA is a city located in western Georgia, known for its diverse economy and strong military presence due to the nearby Fort Benning. The city offers a mix of urban amenities and outdoor recreational opportunities, making it an attractive location for both residents and visitors.Based on the metrics provided, Columbus, GA appears to be more attractive for short-term rental investments due to the higher STR Genie Score and stronger net ROI. Investors looking for higher returns and a potentially more stable market may find success in the short-term rental market in this area. However, long-term rental investments may still be viable for those seeking a more traditional real estate investment approach. It is recommended for real estate investors to carefully evaluate their investment goals and risk tolerance before deciding on the best strategy for Columbus, GA.
c
Alternative Fuel Stations in New York
s.cnmilf.com
data.ny.gov
+1more
Updated Jun 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ny.gov (2025). Alternative Fuel Stations in New York [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/alternative-fuel-stations-in-new-york
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
data.ny.gov
Area covered
New York
Description
Go to https://afdc.energy.gov/stations/#/find/nearest to access the full database of alternative fuel station locations nationwide, collected and maintained by the U.S. Department of Energy National Renewable Energy Laboratory. A station appears as one point in the data and on the map, regardless of the number of fuel dispensers or charging outlets at that _location. For EV charging stations for example, the data includes the number of number of charging ports available at the specific station. How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov.
Weather forecasting at Ria Arousa (Spain) using AI
kaggle.com
zip
Updated Apr 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge Robinat (2021). Weather forecasting at Ria Arousa (Spain) using AI [Dataset]. https://www.kaggle.com/jorgerobinat/weather-forecasting-at-ria-arousa-spain-using-ai
Explore at:
zip(3513141742 bytes)Available download formats
Dataset updated
Apr 5, 2021
Authors
Jorge Robinat
Area covered
Spain, Ría de Arousa
Description
Context

Our aim is to improve the accuracy of the meteorological model with Machine Learning. To do so we need a database that contains input variables (meteorological model results) and output data (actual data from a meteorological station). Dependent variables are outputs of the meteorological model. Independent variables are measured by the meteorological station. The trained Machine Learning algorithm will take the variables from the meteorological model and forecast a meteorological variable. The meteorological model is a WRF model maintained by Meteogalicia, a public meteorological service from Galicia (Spain). The model has a resolution of 4 Km. We get the nearest points outputs provided by the model from the station. This dataset is focused on meteorological stations at Ria Arousa (Spain). The meteorological stations are: Coron at latitude: 42.5801 N and longitude: 8.80471 W. and Cortegada at latitude: 42.626 N and longitude: 8.784 W.

Content

The dataset contains:

1._Files (.csv) with the meteorological model: Format LatXX.XX-lonXX.XXp4R4KmD0.csv when lat. and lon. represent latitude and longitude of the meteorological station. p is the number of nearest points from the station (4 points in this case). R is the spatial resolution of the model (4 Km in this case). D means the Day forecast. D0 represents hours H+1 to H+24 from time analysis (we use 00Z analysis of WRF Meteogalicia model). D1 represents hours H+25 H+48 and so on. Each meteorological variable ends with a numerical suffix representing the point. The nearest point is "po" and the farthest point would be: "p3". Columns are meteorological variables forecasted and column time (every hour):

lhflx: Surface downward latent heat flux. Units, watts per square meters.

dir: Predicted wind direction at 10 meters. From North direction clockwise. Units are degrees. Unlike dir_o no variable wind is forecasted (no -1 values)

mod: Wind intensity forecasted at 10 meters. Units are meters per second.

prec: Total accumulated rainfall between each model output. In our case, every hour. Units kilograms per meter squared.

rh: Relative Humidity. Units fraction

visibility: Visibility in air. Units meters. Minimum visibility 26.028316 meters. Maximum visibility 24235.000000

wind_gust: Wind gust at 10 meters. Units are meters per second. Unlike wind gust_o always forecasted (no -1 value)

mslp: Sea Level Pressure in pascals

temp: Air Temperature in Kelvin at 2 meters

cape: Convective available potential energy. Units: Jules per kilogram. Check this link for more information

cin: Convective inhibition. Click here for more information. Units Jules per Kilogram

cfl: Cloud area fraction at low atmosphere layer. I found 1251 samples with values higher than 1 !! Perhaps, we wouldn’t trust this feature so much.

cfm: Cloud area fraction at mid atmosphere layer. Also, I found 37 samples with values higher than 1.

conv_prec: Total accumulated convective rainfall between each model output. Every hour in our case.

HGT500: Geopotential height at 500mb. Units m

HGT850: Geopotential height at 850mb. Units m

T500: Temperature at 500mb. Units Kelvin

T850: Temperature at 850mb. Units Kelvin

cfh: Cloud cover at high levels. Units fraction

cft: Cloud cover at low and mid-levels. Units fraction

lwflx: Surface downward latent heat flux. Units: W m-2

2._Files with format: stationname.csv: Contain the actual meteorological variables mesured every 10 or 60 minutes. Variables are:

dir_o: wind direction (degrees) gust_direction_o: gust direction (degees) gust_speed_o: gust speed (m/s) spd_o: speed (m/s) std_dir_o: standard deviation direction (degrees) std_spd_o: standard deviation speed (m/s) gust_spd_max_hour_before_o: max gust speed an hour before (m/s) prec_o: precipitation every 10 minutes (mm) prec_accumulated_1_hour_before: precipitation accumulated one hour before (mm)

3._ Files with format: metvar_stationname_pxRxKDX.al: Contain the algorthm (independent variables, scaler, PCA, and quality stadisticcs about the algorithn itself and the meteorological model). metvar is the variable forecasted. pX number of the 4 nearest points . RXKm model resolution (4 Km in our case). D forecast day. These files are required by the notebook (operational_arousa) to get the daily results.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
d
Alternative Fueling Stations
catalog.data.gov
gimi9.com
+6more
Updated May 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (NREL) (Point of Contact) (2025). Alternative Fueling Stations [Dataset]. https://catalog.data.gov/dataset/alternative-fueling-stations1
Explore at:
Dataset updated
May 2, 2025
Dataset provided by
National Renewable Energy Laboratory (NREL) (Point of Contact)
Description
The Alternative Fueling Stations dataset is updated daily from the National Renewable Energy Laboratory (NREL) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). For more information about the update cycle and data collection methods, please refer to https://afdc.energy.gov/stations/#/find/nearest?show_about=true. This dataset shows all station access types (public and private) and statuses (available, planned, and temporarily unavailable) by default. To view only publicly available stations, use the access and status filters. The U.S. Department of Energy collects these data in partnership with Clean Cities coalitions and their stakeholders to help fleets and consumers find alternative fueling stations. Clean Cities coalitions foster the nation's economic, environmental, and energy security by working locally to advance affordable, efficient, and clean transportation fuels and technologies. This data can be found on the Alternative Fuels Data Center: https://doi.org/10.21949/1519144. For more information about the data schema and data dictionary, please see https://developer.nrel.gov/docs/transportation/alt-fuel-stations-v1/all/#response-fields. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529008

Facebook

Twitter

Click to copy link

Link copied

Cite

Alexander V. Mantzaris (2025). LMDiskANN.jl: An Implementation of the Low Memory Disk Approximate Nearest Neighbors Search Algorithm [Dataset]. http://doi.org/10.6084/m9.figshare.29286668.v1

Data from: LMDiskANN.jl: An Implementation of the Low Memory Disk Approximate Nearest Neighbors Search Algorithm

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.29286668.v1

Dataset updated

Jun 10, 2025

Dataset provided by

figshare

Authors

Alexander V. Mantzaris

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

LMDiskANN.jl (v1.2.0) is a Julia package that implements the Low-Memory Disk Approximate Nearest-Neighbor (LM-DiskANN) algorithm, extending DiskANN-style graph search to handle billion-scale vector datasets while keeping RAM usage to a minimum. It stores adjacency lists on disk via memory-mapped files, performs tunable best-first graph traversals for fast and accurate queries, and supports dynamic insertions and deletions with automatic pruning to maintain a compact index. The library exposes knobs to balance recall against latency, and it optionally pairs a LevelDB key–value store with the node IDs for flexible external key lookup. These capabilities make LMDiskANN.jl well-suited for embedding retrieval, recommendation systems, and other large-scale similarity-search workloads that need high throughput on commodity hardware.

Clear search

Close search

Google apps

Main menu

Data from: LMDiskANN.jl: An Implementation of the Low Memory Disk...

nearest-neighbors-datasets

snn_exp

Search Nearby API | DATA.GOV.HK

ann-t2i-1m

Nearby

Nearby

Results of the proposed method against existing techniques on Diab dataset.

deep1b

Latitude and longitude search for nearby unexpired events

Find a Health Center

my-vicinity-repo

Survey Control Points

Results of the proposed method against existing techniques on Hepatitis...

MetroCard Vendor Location Finder

Data from: Fast open modification spectral library searching through...

Columbus, GA Real Estate Investment Insights

Alternative Fuel Stations in New York

Weather forecasting at Ria Arousa (Spain) using AI

Context

Content

Acknowledgements

Inspiration

Alternative Fueling Stations

Data from: LMDiskANN.jl: An Implementation of the Low Memory Disk Approximate Nearest Neighbors Search Algorithm