97 datasets found

Datasets for figures and tables
catalog.data.gov
datasets.ai
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Datasets for figures and tables [Dataset]. https://catalog.data.gov/dataset/datasets-for-figures-and-tables
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Software Model simulations were conducted using WRF version 3.8.1 (available at https://github.com/NCAR/WRFV3) and CMAQ version 5.2.1 (available at https://github.com/USEPA/CMAQ). The meteorological and concentration fields created using these models are too large to archive on ScienceHub, approximately 1 TB, and are archived on EPA’s high performance computing archival system (ASM) at /asm/MOD3APP/pcc/02.NOAH.v.CLM.v.PX/. Figures Figures 1 – 6 and Figure 8: Created using the NCAR Command Language (NCL) scripts (https://www.ncl.ucar.edu/get_started.shtml). NCLD code can be downloaded from the NCAR website (https://www.ncl.ucar.edu/Download/) at no cost. The data used for these figures are archived on EPA’s ASM system and are available upon request. Figures 7, 8b-c, 8e-f, 8h-i, and 9 were created using the AMET utility developed by U.S. EPA/ORD. AMET can be freely downloaded and used at https://github.com/USEPA/AMET. The modeled data paired in space and time provided in this archive can be used to recreate these figures. The data contained in the compressed zip files are organized in comma delimited files with descriptive headers or space delimited files that match tabular data in the manuscript. The data dictionary provides additional information about the files and their contents. This dataset is associated with the following publication: Campbell, P., J. Bash, and T. Spero. Updates to the Noah Land Surface Model in WRF‐CMAQ to Improve Simulated Meteorology, Air Quality, and Deposition. Journal of Advances in Modeling Earth Systems. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(1): 231-256, (2019).
d
Warehouse and Retail Sales
catalog.data.gov
data.montgomerycountymd.gov
+4more
Updated Jun 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.montgomerycountymd.gov (2025). Warehouse and Retail Sales [Dataset]. https://catalog.data.gov/dataset/warehouse-and-retail-sales
Explore at:
Dataset updated
Jun 29, 2025
Dataset provided by
data.montgomerycountymd.gov
Description
This dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly
d
MODFLOW-LGR data sets for the Great Basin carbonate and alluvial aquifer...
catalog.data.gov
data.usgs.gov
+4more
Updated Nov 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). MODFLOW-LGR data sets for the Great Basin carbonate and alluvial aquifer system model version 3.0: Revisions in southwestern Utah and east central Nevada [Dataset]. https://catalog.data.gov/dataset/modflow-lgr-data-sets-for-the-great-basin-carbonate-and-alluvial-aquifer-system-model-vers
Explore at:
Dataset updated
Nov 30, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Utah, Great Basin
Description
A new version of previously published steady-state numerical groundwater flow models of the Great Basin carbonate and alluvial aquifer system (GBCAAS), and was developed in conjunction with U.S. Geological Survey (USGS) studies in Parowan, Pine, and Wah Wah Valleys, Utah. This version of the model is considered to be GBCAAS v. 3.0 and supersedes previous versions. This model added 15 transient calibration stress periods and 14 projection stress periods, aquifer storage properties, historical withdrawals in Parowan Valley, and observations of water-level changes in Parowan Valley to the previous steady-state versions. Recharge in Parowan Valley and withdrawal from wells in Parowan Valley and two nearby wells in Cedar City Valley vary for each calibration stress period representing conditions from March 1940 to November 2013. Stresses, including recharge, are the same in each stress period as in the steady-state stress period for all areas outside of Parowan Valley. This data release contains one calibration simulation and one projection simulation. The model is calibrated to transient conditions only in Parowan Valley. Simulated storage properties outside of Parowan Valley are set the same as the Parowan Valley properties and should not be considered calibrated. The projection simulation was used to estimate that reducing withdrawals in Parowan Valley from 35,000 to about 22,000 acre-feet per year should stabilize groundwater levels in the valley if recharge varies as it did from about 1950 to 2012 and that withdrawals of 15,000 acre-feet per year from Pine Valley and 6,500 acre-feet per year from Wah Wah Valley could ultimately (long-term steady-state) cause water-level declines of about 1,900 feet near the withdrawal wells and more than 5 feet over about 10,500 square miles. This USGS data release contains all of the input and output files for the simulations described in the associated model documentation report (https://doi.org/10.3133/sir20175072). This data release also contains source code needed to run the models. Model files presented in this data release were modified from an existing, calibrated, steady-state model of the Great Basin carbonate and alluvial aquifer system. SIR 2014-5213 (https://pubs.usgs.gov/sir/2014/5213/) and SIR 2017-5011 (https://doi.org/10.3133/sir20175011) document the construction and calibration of the previous versions of this model. Modifications that were made to the input files and discussion of model results are documented in SIR2017-5072 (https://doi.org/10.3133/sir20175072), which is associated with this data release. The model consists of a parent and a child model and must be run using MODFLOW-LGR. The child model is far removed from the area considered for this project, but is being kept with the model so that one model version exists of the Great Basin carbonate and alluvial aquifer system that incorporates all refinements and improvements. The model files documented in this data release should be used instead of previous versions.
d
Data from: A Generic Local Algorithm for Mining Data Streams in Large...
catalog.data.gov
datasets.ai
+3more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems [Dataset]. https://catalog.data.gov/dataset/a-generic-local-algorithm-for-mining-data-streams-in-large-distributed-systems
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system's functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, k-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient emph{local} algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its k-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.
National Soils Database - Dataset - data.gov.ie
data.gov.ie
Updated Jul 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.ie (2021). National Soils Database - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/national-soils-database
Explore at:
Dataset updated
Jul 23, 2021
Dataset provided by
data.gov.ie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The National Soil Database has produced a national database of soil geochemistry including point and spatial distribution maps of major nutrients, major elements, essential trace elements, trace elements of special interest and minor elements. In addition, this study has generated a National Soil Archive, comprising bulk soil samples and a nucleic acids archive each of which represent a valuable resource for future soils research in Ireland. The geographical coherence of the geochemical results was considered to be predominantly underpinned by underlying parent material and glacial geology. Other factors such as soil type, land use, anthropogenic effects and climatic effects were also evident. The coherence between elements, as displayed by multivariate analyses, was evident in this study. Examples included strong relationships between Co, Fe, As, Mn and Cu. This study applied large-scale microbiological analysis of soils for the first time in Ireland and in doing so also investigated microbial community structure in a range of soil types in order to determine the relationship between soil microbiology and chemistry. The results of the microbiological analyses were consistent with geochemical analyses and demonstrated that bacterial community populations appeared to be predominantly determined by soil parent material and soil type.
USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
Big Free-Tailed Bat Range - CWHR M041 [ds1836]
data.ca.gov
data.cnra.ca.gov
+4more
Updated Mar 9, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2020). Big Free-Tailed Bat Range - CWHR M041 [ds1836] [Dataset]. https://data.ca.gov/dataset/big-free-tailed-bat-range-cwhr-m041-ds1836
Explore at:
arcgis geoservices rest api, html, zip, csv, kml, geojsonAvailable download formats
Dataset updated
Mar 9, 2020
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Vector datasets of CWHR range maps are one component of California Wildlife Habitat Relationships (CWHR), a comprehensive information system and predictive model for Californias wildlife. The CWHR System was developed to support habitat conservation and management, land use planning, impact assessment, education, and research involving terrestrial vertebrates in California. CWHR contains information on life history, management status, geographic distribution, and habitat relationships for wildlife species known to occur regularly in California. Range maps represent the maximum, current geographic extent of each species within California. They were originally delineated at a scale of 1:5,000,000 by species-level experts and have gradually been revised at a scale of 1:1,000,000. For more information about CWHR, visit the CWHR webpage (https://www.wildlife.ca.gov/Data/CWHR). The webpage provides links to download CWHR data and user documents such as a look up table of available range maps including species code, species name, and range map revision history; a full set of CWHR GIS data; .pdf files of each range map or species life history accounts; and a User Guide.
Small Areas Generalised 20m - National Statistical Boundaries - 2015 -...
data.gov.ie
Updated Feb 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.ie (2024). Small Areas Generalised 20m - National Statistical Boundaries - 2015 - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/small-areas-generalised-20m-national-statistical-boundaries-20151
Explore at:
Dataset updated
Feb 12, 2024
Dataset provided by
data.gov.ie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Small Area Boundaries were created with the following credentials. National boundary dataset. Consistent sub-divisions of an ED. Created not to cross some natural features. Defined area with a minimum number of GeoDirectory building address points. Defined area initially created with minimum of 65 – approx. average of around 90 residential address points. Generated using two bespoke algorithms which incorporated the ED and Townland boundaries, ortho-photography, large scale vector data and GeoDirectory data. Before the 2011 census they were split in relation to motorways and dual carriageways. After the census some boundaries were merged and other divided to maintain privacy of the residential area occupants. They are available as generalised and non generalised boundary sets in the ITM projection.This dataset is provided by Tailte Éireann
d
DCM
catalog.data.gov
data.cityofnewyork.us
+1more
Updated Nov 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). DCM [Dataset]. https://catalog.data.gov/dataset/dcm
Explore at:
Dataset updated
Nov 8, 2024
Dataset provided by
data.cityofnewyork.us
Description
The Digital City Map (DCM) data represents street lines and other features shown on the City Map, which is the official street map of the City of New York. The City Map consists of 5 different sets of maps, one for each borough, totaling over 8000 individual paper maps. The DCM datasets were created in an ongoing effort to digitize official street records and bring them together with other street information to make them easily accessible to the public. The Digital City Map (DCM) is comprised of seven datasets; Digital City Map, Street Center Line, City Map Alterations, Arterial Highways and Major Streets, Street Name Changes (areas), Street Name Changes (lines), and Street Name Changes (points). All of the Digital City Map (DCM) datasets are featured on the Streets App All previously released versions of this data are available at BYTES of the BIG APPLE- Archive Updates for this dataset, along with other multilayered maps on NYC Open Data, are temporarily paused while they are moved to a new mapping format. Please visit https://www.nyc.gov/site/planning/data-maps/open-data/dwn-digital-city-map.page to utilize this data in the meantime.
d
GAL GW Quantile Interpolation 20161013
data.gov.au
researchdata.edu.au
+2more
zip
Updated Nov 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2019). GAL GW Quantile Interpolation 20161013 [Dataset]. https://data.gov.au/data/dataset/groups/49f20390-3340-4b08-b1dc-370fb919d34c
Explore at:
zipAvailable download formats
Dataset updated
Nov 20, 2019
Dataset provided by
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

This dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement.

The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

The Groundwater (GW) quantiles are extracted from the Groundwater modelling outputs. Dataset prepared for import into the Impact and Risk Analysis Database.

Dataset History

Drawdown percentile and exceedance probability values was extracted from groundwater model outputs. This was performed using a GIS routine to extract groundwater model raster values using the assessment units (as points) attributed with the regional water table aquifer layer and assigning the model value from the corresponding layer to each assessment unit.

Dataset Citation

XXXX XXX (2017) GAL GW Quantile Interpolation 20161013. Bioregional Assessment Derived Dataset. Viewed 12 December 2018, http://data.bioregionalassessments.gov.au/dataset/49f20390-3340-4b08-b1dc-370fb919d34c.

Dataset Ancestors

Derived From Surface Geology of Australia, 1:2 500 000 scale, 2012 edition

Derived From Galilee Drawdown Rasters

Derived From Galilee model HRV receptors gdb

Derived From Queensland petroleum exploration data - QPED

Derived From Galilee groundwater numerical modelling AEM models

Derived From Galilee drawdown grids

Derived From Three-dimensional visualisation of the Great Artesian Basin - GABWRA

Derived From Geoscience Australia GEODATA TOPO series - 1:1 Million to 1:10 Million scale

Derived From Phanerozoic OZ SEEBASE v2 GIS

Derived From Galilee Hydrological Response Variable (HRV) model

Derived From QLD Department of Natural Resources and Mines Groundwater Database Extract 20142808

Derived From GAL Assessment Units 1000m 20160522 v01

Derived From Galilee Groundwater Model, Hydrogeological Formation Extents v01

Derived From BA ALL Assessment Units 1000m Reference 20160516_v01

Derived From GAL Aquifer Formation Extents v01

Derived From Queensland Geological Digital Data - Detailed state extent, regional. November 2012

Derived From BA ALL Assessment Units 1000m 'super set' 20160516_v01

Derived From GAL Aquifer Formation Extents v02
Big Belly Locations
healthdata.gov
data.boston.gov
application/rdfxml +5
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.boston.gov (2025). Big Belly Locations [Dataset]. https://healthdata.gov/dataset/Big-Belly-Locations/fj9v-4mzj
Explore at:
csv, json, tsv, xml, application/rdfxml, application/rssxmlAvailable download formats
Dataset updated
Apr 8, 2025
Dataset provided by
data.boston.gov
Description
Big Belly trash receptacles are solar powered, internet connected, compacting trash receptacles that can collect up to five times as much waste as traditional bins and help the city more efficiently manage the waste collection process. This data set contains descriptions and geographic coordinates for all of the Big Belly receptacles located within the City.
u
The Bushland, Texas, Alfalfa Datasets
agdatacommons.nal.usda.gov
s.cnmilf.com
+1more
xlsx
Updated Mar 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven R. Evett; Karen S. Copeland; Brice B. Ruthardt; Gary W. Marek; Paul D. Colaizzi; Terry A. Howell; David K. Brauer (2024). The Bushland, Texas, Alfalfa Datasets [Dataset]. http://doi.org/10.15482/USDA.ADC/1526356
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1526356
Dataset updated
Mar 4, 2024
Dataset provided by
Ag Data Commons
Authors
Steven R. Evett; Karen S. Copeland; Brice B. Ruthardt; Gary W. Marek; Paul D. Colaizzi; Terry A. Howell; David K. Brauer
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Bushland, Texas
Description
This parent dataset (collection of datasets) describes the general organization of data in the datasets for each growing season (year) when alfalfa (Medicago sativa L.) was grown as a reference evapotranspiration (ETr) crop at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU), Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Alfalfa was grown on two large, precision weighing lysimeters, calibrated to NIST standards (Howell et al., 1995). Each lysimeter was in the center of a 4.44 ha square field on which alfalfa was also grown (Evett et al., 2000). The two fields were contiguous and arranged with one (labeled northeast, NE) directly north of the other (labeled southeast, SE). See the resource "Geographic Coordinates, USDA, ARS, Bushland, Texas" for UTM geographic coordinates for field and lysimeter locations. Alfalfa was planted in Autumn 1995 and grown for hay in 1996, 1997, 1998, and 1999. The resource "Agronomic Calendar for the Bushland, Texas Alfalfa Datasets", gives a calendar listing by date the agronomic practices applied, severe weather, and activities (e.g. planting, thinning, fertilization, pesticide application, lysimeter maintenance, harvest) in and on lysimeters that could influence crop growth, water use, and lysimeter data. These include fertilizer and pesticide applications. There is one calendar, from before planting in autumn 1995 to after final harvest in 1999, for the NE and SE lysimeters and fields. There were 4 harvests each year except 1998 when 5 harvests were taken. Irrigation was by linear move sprinkler system equipped with pressure regulated low pressure sprays (mid-elevation spray application, MESA). Irrigations were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings via field-calibrated (Evett and Steiner, 1995) neutron probe from 0.10- to 2.4-m depth in the field. Lysimeters and fields were planted to the same plant density, row spacing, tillage depth (by hand on the lysimeters and by machine in the fields), and fertilizer and pesticide applications. Weighing lysimeters measured relative soil water storage to 0.05 mm accuracy at 5-min intervals, and the 5-min change in soil water storage was used along with precipitation, dew and frost accumulation, and irrigation amounts to calculate crop evapotranspiration (ET), reported at 15-min intervals. Each lysimeter was instrumented to sense wind speed, air temperature and humidity, radiant energy (incoming and reflected, typically both shortwave and longwave), surface temperature, soil heat flux, and soil temperature, all at 15-min intervals. Instruments used changed from season to season, thus subsidiary datasets and data dictionaries for each season are required. The Bushland weighing lysimeter research program is described by Evett et al. (2016), and lysimeter design is described by Marek et al. (1988). Important conventions concerning the data-time correspondence, sign conventions, and terminology specific to the USDA ARS, Bushland, TX, field operations are given in the resource "Conventions for Bushland, TX, Weighing Lysimeter Datasets". There are 5 datasets in this collection. Common symbols and abbreviations used are defined in the resource "Symbols and Abbreviations for Bushland, TX, Weighing Lysimeter Datasets". Datasets consist of Excel (xlsx) files. Each xlsx file contains an Introductory tab that explains the other tabs, lists the authors, describes conventions and symbols used, and lists instruments used. The remaining tabs in a file consist of dictionary and data tabs. The 5 datasets are:

Growth and Yield Data for the Bushland, Texas Alfalfa Datasets Weighing Lysimeter Data for The Bushland, Texas Alfalfa Datasets Soil Water Content Data for The Bushland, Texas, Large Weighing Lysimeter Experiments Evapotranspiration, Irrigation, Dew/frost - Water Balance Data for The Bushland, Texas Alfalfa Datasets Standard Quality Controlled Research Weather Data – USDA-ARS, Bushland, Texas

See README for descriptions of each dataset. The soil is a Pullman series fine, mixed, superactive, thermic Torrertic Paleustoll. Soil properties are given in the resource titled "Soil Properties for the Bushland, TX, Weighing Lysimeter Datasets". Land slope in the lysimeter fields is
d
Data from: Stable and Efficient Gaussian Process Calculations
catalog.data.gov
datasets.ai
+1more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Stable and Efficient Gaussian Process Calculations [Dataset]. https://catalog.data.gov/dataset/stable-and-efficient-gaussian-process-calculations
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
The use of Gaussian processes can be an effective approach to prediction in a supervised learning environment. For large data sets, the standard Gaussian process approach requires solving very large systems of linear equations and approximations are required for the calculations to be practical. We will focus on the subset of regressors approximation technique. We will demonstrate that there can be numerical instabilities in a well known implementation of the technique. We discuss alternate implementations that have better numerical stability properties and can lead to better predictions. Our results will be illustrated by looking at an application involving prediction of galaxy redshift from broadband spectrum data.
d
Multivariate Time Series Search
catalog.data.gov
data.wu.ac.at
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Multivariate Time Series Search [Dataset]. https://catalog.data.gov/dataset/multivariate-time-series-search
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem — (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.
c
Data from: Detecting Anomalies in Multivariate Data Sets with Switching...
s.cnmilf.com
datasets.ai
+3more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Detecting Anomalies in Multivariate Data Sets with Switching Sequences and Continuous Streams [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/detecting-anomalies-in-multivariate-data-sets-with-switching-sequences-and-continuous-stre
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
The world-wide aviation system is one of the most complex dynamical systems ever developed and is generating data at an extremely rapid rate. Most modern commercial aircraft record several hundred flight parameters including information from the guidance, navigation, and control systems, the avionics and propulsion systems, and the pilot inputs into the aircraft. These parameters may be continuous measurements or binary or categorical measurements recorded in one second intervals for the duration of the flight. Currently, most approaches to aviation safety are reactive, meaning that they are designed to react to an aviation safety incident or accident. Here, we discuss a novel approach based on the theory of multiple kernel learning to detect potential safety anomalies in very large data bases of discrete and continuous data from world-wide operations of commercial fleets. We pose a general anomaly detection problem which includes both discrete and continuous data streams, where we assume that the discrete streams have a causal influence on the continuous streams. We also assume that atypical sequence of events in the discrete streams can lead to off-nominal system performance. We discuss the application _domain, novel algorithms, and also briefly discuss results on synthetic and real-world data sets. Our algorithm uncovers operationally significant events in high dimensional data streams in the aviation industry which are not detectable using state of the art methods.
d
Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...
catalog.data.gov
data.nasa.gov
+2more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).
TREC 2022 Deep Learning test collection
catalog.data.gov
s.cnmilf.com
+1more
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). TREC 2022 Deep Learning test collection [Dataset]. https://catalog.data.gov/dataset/trec-2022-deep-learning-test-collection
Explore at:
Dataset updated
May 9, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision).Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks.Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision?The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.
Big Bend National Park Small-Scale Base GIS Data
catalog.data.gov
datasets.ai
+1more
Updated Jun 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Park Service (2024). Big Bend National Park Small-Scale Base GIS Data [Dataset]. https://catalog.data.gov/dataset/big-bend-national-park-small-scale-base-gis-data
Explore at:
Dataset updated
Jun 5, 2024
Dataset provided by
National Park Servicehttp://www.nps.gov/
Description
This data set contains small-scale base GIS data layers compiled by the National Park Service Servicewide Inventory and Monitoring Program and Water Resources Division for use in a Baseline Water Quality Data Inventory and Analysis Report that was prepared for the park. The report presents the results of surface water quality data retrievals for the park from six of the United States Environmental Protection Agency's (EPA) national databases: (1) Storage and Retrieval (STORET) water quality database management system; (2) River Reach File (RF3) Hydrography; (3) Industrial Facilities Discharges; (4) Drinking Water Supplies; (5) Water Gages; and (6) Water Impoundments. The small-scale GIS data layers were used to prepare the maps included in the report that depict the locations of water quality monitoring stations, industrial discharges, drinking intakes, water gages, and water impoundments. The data layers included in the maps (and this dataset) vary depending on availability, but generally include roads, hydrography, political boundaries, USGS 7.5' minute quadrangle outlines, hydrologic units, trails, and others as appropriate. The scales of each layer vary depending on data source but are generally 1:100,000.
d
Facilities Database
catalog.data.gov
data.cityofnewyork.us
+1more
Updated Jun 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). Facilities Database [Dataset]. https://catalog.data.gov/dataset/facilities-database
Explore at:
Dataset updated
Jun 21, 2025
Dataset provided by
data.cityofnewyork.us
Description
The Department of City Planning aggregates information about 30,000+ facilities and program sites that are owned, operated, funded, licensed, or certified by a City, State, or Federal agency in the City of New York into a central database called the City Planning Facilities Database (FacDB). These facilities generally help to shape quality of life in the city's neighborhoods, and this dataset is the basis for a series of planning activities. This public data resource allows all New Yorkers to understand the breadth of government resources in their neighborhoods. The data is also complemented with a new interactive web map that enables users to easily filter the data for their needs. Users are strongly encouraged to read the database documentation, particularly with regard to analytical limitations. Questions about this database can be directed to dcpopendata@planning.nyc.gov All previously released versions of this data are available at BYTES of the BIG APPLE Archive
d
DCM_StreetNameChanges_Points
catalog.data.gov
data.cityofnewyork.us
+1more
Updated Nov 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2024). DCM_StreetNameChanges_Points [Dataset]. https://catalog.data.gov/dataset/dcm-streetnamechanges-points
Explore at:
Dataset updated
Nov 8, 2024
Dataset provided by
data.cityofnewyork.us
Description
The Digital City Map (DCM) data represents street lines and other features shown on the City Map, which is the official street map of the City of New York. The City Map consists of 5 different sets of maps, one for each borough, totaling over 8000 individual paper maps. The DCM datasets were created in an ongoing effort to digitize official street records and bring them together with other street information to make them easily accessible to the public. The Digital City Map (DCM) is comprised of seven datasets; Digital City Map, Street Center Line, City Map Alterations, Arterial Highways and Major Streets, Street Name Changes (areas), Street Name Changes (lines), and Street Name Changes (points). All of the Digital City Map (DCM) datasets are featured on the Streets App All previously released versions of this data are available at BYTES of the BIG APPLE- Archive Updates for this dataset, along with other multilayered maps on NYC Open Data, are temporarily paused while they are moved to a new mapping format. Please visit https://www.nyc.gov/site/planning/data-maps/open-data/dwn-digital-city-map.page to utilize this data in the meantime.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Research and Development (ORD) (2020). Datasets for figures and tables [Dataset]. https://catalog.data.gov/dataset/datasets-for-figures-and-tables

Datasets for figures and tables

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Nov 12, 2020

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Description

Software Model simulations were conducted using WRF version 3.8.1 (available at https://github.com/NCAR/WRFV3) and CMAQ version 5.2.1 (available at https://github.com/USEPA/CMAQ). The meteorological and concentration fields created using these models are too large to archive on ScienceHub, approximately 1 TB, and are archived on EPA’s high performance computing archival system (ASM) at /asm/MOD3APP/pcc/02.NOAH.v.CLM.v.PX/. Figures Figures 1 – 6 and Figure 8: Created using the NCAR Command Language (NCL) scripts (https://www.ncl.ucar.edu/get_started.shtml). NCLD code can be downloaded from the NCAR website (https://www.ncl.ucar.edu/Download/) at no cost. The data used for these figures are archived on EPA’s ASM system and are available upon request. Figures 7, 8b-c, 8e-f, 8h-i, and 9 were created using the AMET utility developed by U.S. EPA/ORD. AMET can be freely downloaded and used at https://github.com/USEPA/AMET. The modeled data paired in space and time provided in this archive can be used to recreate these figures. The data contained in the compressed zip files are organized in comma delimited files with descriptive headers or space delimited files that match tabular data in the manuscript. The data dictionary provides additional information about the files and their contents. This dataset is associated with the following publication: Campbell, P., J. Bash, and T. Spero. Updates to the Noah Land Surface Model in WRF‐CMAQ to Improve Simulated Meteorology, Air Quality, and Deposition. Journal of Advances in Modeling Earth Systems. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(1): 231-256, (2019).

Clear search

Close search

Google apps

Main menu

Datasets for figures and tables

Warehouse and Retail Sales

MODFLOW-LGR data sets for the Great Basin carbonate and alluvial aquifer...

Data from: A Generic Local Algorithm for Mining Data Streams in Large...

National Soils Database - Dataset - data.gov.ie

USA Name Data

Context

Content

Acknowledgements

Inspiration

Big Free-Tailed Bat Range - CWHR M041 [ds1836]

Small Areas Generalised 20m - National Statistical Boundaries - 2015 -...

DCM

GAL GW Quantile Interpolation 20161013

Abstract

Dataset History

Dataset Citation

Dataset Ancestors

Big Belly Locations

The Bushland, Texas, Alfalfa Datasets

Data from: Stable and Efficient Gaussian Process Calculations

Multivariate Time Series Search

Data from: Detecting Anomalies in Multivariate Data Sets with Switching...

Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...

TREC 2022 Deep Learning test collection

Big Bend National Park Small-Scale Base GIS Data

Facilities Database

DCM_StreetNameChanges_Points

Datasets for figures and tablesSee More Versions

Datasets for figures and tables