100+ datasets found
  1. a

    How to download GIS data using filtering tools

    • data-monmouthnj.hub.arcgis.com
    Updated Jul 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monmouth County NJ GIS (2022). How to download GIS data using filtering tools [Dataset]. https://data-monmouthnj.hub.arcgis.com/documents/82c62feaeca4456e95a2028586af083f
    Explore at:
    Dataset updated
    Jul 28, 2022
    Dataset authored and provided by
    Monmouth County NJ GIS
    Description

    Esri's ArcGIS Online tools provide three methods of filtering larger datasets using attribute or geospatial information that are a part of each individual dataset. These instructions provide a basic overview of the step a GeoHub end user can take to filter out unnecessary data or to specifically hone in a particular location to find data related to this location and download the specific information filtered through the search bar, as seen on the map or using the attribute filters in the Data tab.

  2. f

    Results of data filtering and peak finding

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 21, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hurowitz, Evan H.; Drori, Iddo; Stodden, Victoria C.; Donoho, David L.; Brown, Patrick O. (2013). Results of data filtering and peak finding [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001655995
    Explore at:
    Dataset updated
    Feb 21, 2013
    Authors
    Hurowitz, Evan H.; Drori, Iddo; Stodden, Victoria C.; Donoho, David L.; Brown, Patrick O.
    Description

    Since these microarrays contained duplicated spots, the parentheses represent the number of unique spots or profiles in the dataset.

  3. Data for Filtering Organized 3D Point Clouds for Bin Picking Applications

    • catalog.data.gov
    • datasets.ai
    Updated Apr 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2024). Data for Filtering Organized 3D Point Clouds for Bin Picking Applications [Dataset]. https://catalog.data.gov/dataset/data-for-filtering-organized-3d-point-clouds-for-bin-picking-applications
    Explore at:
    Dataset updated
    Apr 11, 2024
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Contains scans of a bin filled with different parts ( screws, nuts, rods, spheres, sprockets). For each part type, RGB image and organized 3D point cloud obtained with structured light sensor are provided. In addition, unorganized 3D point cloud representing an empty bin and a small Matlab script to read the files is also provided. 3D data contain a lot of outliers and the data were used to demonstrate a new filtering technique.

  4. h

    data-filtering-statistics

    • huggingface.co
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loubna Ben Allal (2023). data-filtering-statistics [Dataset]. https://huggingface.co/datasets/loubnabnl/data-filtering-statistics
    Explore at:
    Dataset updated
    Aug 8, 2023
    Authors
    Loubna Ben Allal
    Description

    Filterings on top of near-dedup + line filtering:

    Comments filtering (at least 1% of the number of lines should be comments/docstrings) Stars filtering (minimum of 5 stars) (on top of near-dedup + line filtering)

    Language Before filtering Stars Comments ratio More near-dedup Tokenizer fertility

    Python 75.61 GB 26.56 GB 65.64 GB 61.97 GB 72.52 GB

    Java 110 GB 35.83 GB 92.7 GB 88.42 GB 105.47 GB

    Javascript 82.7 GB 20.76 GB 57.5 GB 65.09 GB 76.37 GB

  5. f

    Number of Animals After Data Filtering.

    • datasetcatalog.nlm.nih.gov
    Updated Sep 29, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hess, Melanie K.; Hess, Andrew S.; Garrick, Dorian J. (2016). Number of Animals After Data Filtering. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001575419
    Explore at:
    Dataset updated
    Sep 29, 2016
    Authors
    Hess, Melanie K.; Hess, Andrew S.; Garrick, Dorian J.
    Description

    Number of Animals After Data Filtering.

  6. f

    Overview of respondents’ profile after data filtering (M = mean, SD =...

    • datasetcatalog.nlm.nih.gov
    Updated Dec 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gentner, Alexandre; Stapel, Jork; Nordhoff, Sina; He, Xiaolin; Happee, Riender (2021). Overview of respondents’ profile after data filtering (M = mean, SD = standard deviation, relative frequencies, n = number of respondents). [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000738717
    Explore at:
    Dataset updated
    Dec 21, 2021
    Authors
    Gentner, Alexandre; Stapel, Jork; Nordhoff, Sina; He, Xiaolin; Happee, Riender
    Description

    Overview of respondents’ profile after data filtering (M = mean, SD = standard deviation, relative frequencies, n = number of respondents).

  7. b

    Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB...

    • berd-platform.de
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Henderson; Mark S Krass; Lucia Zheng; Neel Guha; Christopher D. Manning; Dan Jurafsky; Daniel E. Ho; Peter Henderson; Mark S Krass; Lucia Zheng; Neel Guha; Christopher D. Manning; Dan Jurafsky; Daniel E. Ho (2025). Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset [Dataset]. http://doi.org/10.82939/s2wta-09974
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    ArXiv
    Authors
    Peter Henderson; Mark S Krass; Lucia Zheng; Neel Guha; Christopher D. Manning; Dan Jurafsky; Daniel E. Ho; Peter Henderson; Mark S Krass; Lucia Zheng; Neel Guha; Christopher D. Manning; Dan Jurafsky; Daniel E. Ho
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    We curate a large corpus of legal and administrative data. The utility of this data is twofold: (1) to aggregate legal and administrative data sources that demonstrate different norms and legal standards for data filtering; (2) to collect a dataset that can be used in the future for pretraining legal-domain language models, a key direction in access-to-justice initiatives.

  8. TREC 2002 FILTERING DATASET

    • catalog.data.gov
    • gimi9.com
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2025). TREC 2002 FILTERING DATASET [Dataset]. https://catalog.data.gov/dataset/trec-2002-filtering-dataset
    Explore at:
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Given a topic description and some example relevant documents, build a filtering profile which will select the most relevant examples from an incoming stream of documents. In the TREC 2002 filtering task we will continue to stress adaptive filtering. However, the batch filtering and routing tasks will also be available.

  9. N

    FILTER BY PLATE

    • data.cityofnewyork.us
    csv, xlsx, xml
    Updated Nov 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Finance (DOF) (2025). FILTER BY PLATE [Dataset]. https://data.cityofnewyork.us/City-Government/FILTER-BY-PLATE/p79k-edsi
    Explore at:
    xlsx, csv, xmlAvailable download formats
    Dataset updated
    Nov 30, 2025
    Authors
    Department of Finance (DOF)
    Description

    Check out our data lens page for additional data filtering and sorting options: https://data.cityofnewyork.us/view/i4p3-pe6a

    This dataset contains Open Parking and Camera Violations issued by the City of New York. Updates will be applied to this data set on the following schedule:

    New or open tickets will be updated weekly (Sunday). Tickets satisfied will be updated daily (Tuesday through Sunday). NOTE: Summonses that have been written-off are indicated by blank financials.

    Summons images will not be available during scheduled downtime on Sunday - Monday from 1:00 am to 2:30 am and on Sundays from 5:00 am to 10:00 am.

    • Initial dataset loaded 05/14/2016.
  10. u

    River Thames eDNA temporal metabarcoding study: Results from data filtering

    • rdr.ucl.ac.uk
    bin
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jane Hallam; Elizabeth Clare; John Iwan Jones; Julia Day (2023). River Thames eDNA temporal metabarcoding study: Results from data filtering [Dataset]. http://doi.org/10.5522/04/23684637.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    University College London
    Authors
    Jane Hallam; Elizabeth Clare; John Iwan Jones; Julia Day
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    River Thames
    Description

    Sequencing results from filtering raw sequence data from environmental DNA metabarcoding samples of River Thames fish communities.

    Samples were collected from two sites during 2019 over 12 months from the Thames Basin, London, U.K., sampling a minimum of every week. Site 1. River Lee (freshwater) and site 2. Richmond Lock, Thames River (tidal). Samples were amplified with the primer set MiFish-U.

    The file is an Excel workbook of the sequencing results from filtering the raw sequence data (file "Temporal_eDNA_GC-EC-9225.tar.gz") through the pipeline DADA2: providing ASV IDs, sample and ASV table with readcounts, and fish names.

    For further information on filtering settings see the published paper.

    Hallam J, Clare EL, Jones JI, Day JJ. (2023) Fine-scale environmental DNA metabarcoding provides rapid and effective monitoring of fish community dynamics. Environmental DNA. DOI:10.1002/edn3.486

  11. R

    Data from: Bilateral Filtering Dataset

    • universe.roboflow.com
    zip
    Updated Feb 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    college (2023). Bilateral Filtering Dataset [Dataset]. https://universe.roboflow.com/college-kdlgd/bilateral-filtering
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 6, 2023
    Dataset authored and provided by
    college
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Nodules Bounding Boxes
    Description

    Bilateral Filtering

    ## Overview
    
    Bilateral Filtering is a dataset for object detection tasks - it contains Nodules annotations for 280 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  12. h

    amazon-product-data-filter

    • huggingface.co
    Updated Nov 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iftach Arbel (2023). amazon-product-data-filter [Dataset]. https://huggingface.co/datasets/iarbel/amazon-product-data-filter
    Explore at:
    Dataset updated
    Nov 14, 2023
    Authors
    Iftach Arbel
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for "amazon-product-data-filter"

      Dataset Summary
    

    The Amazon Product Dataset contains product listing data from the Amazon US website. It can be used for various NLP and classification tasks, such as text generation, product type classification, attribute extraction, image recognition and more.

      Languages
    

    The text in the dataset is in English.

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    Each data point provides product information, such… See the full description on the dataset page: https://huggingface.co/datasets/iarbel/amazon-product-data-filter.

  13. d

    Data from: Removing Spikes While Preserving Data and Noise using Wavelet...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Removing Spikes While Preserving Data and Noise using Wavelet Filter Banks [Dataset]. https://catalog.data.gov/dataset/removing-spikes-while-preserving-data-and-noise-using-wavelet-filter-banks
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Many diagnostic datasets suffer from the adverse effects of spikes that are embedded in data and noise. For example, this is true for electrical power system data where the switches, relays, and inverters are major contributors to these effects. Spikes are mostly harmful to the analysis of data in that they throw off real-time detection of abnormal conditions, and classification of faults. Since noise and spikes are mixed together and embedded within the data, removal of the unwanted signals from the data is not always easy and may result in losing the integrity of the information carried by the data. Additionally, in some applications noise and spikes need to be filtered independently. The proposed algorithm is a multi-resolution filtering approach based on Haar wavelets that is capable of removing spikes while incurring insignificant damage to other data. In particular, noise in the data, which is a useful indicator that a sensor is healthy and not stuck, can be preserved using our approach. Presented here is the theoretical background with some examples from a realistic testbed.

  14. N

    MAN filter

    • data.cityofnewyork.us
    csv, xlsx, xml
    Updated Nov 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Finance (DOF) (2025). MAN filter [Dataset]. https://data.cityofnewyork.us/City-Government/MAN-filter/n7bm-ibuu
    Explore at:
    xlsx, csv, xmlAvailable download formats
    Dataset updated
    Nov 23, 2025
    Authors
    Department of Finance (DOF)
    Description

    Check out our data lens page for additional data filtering and sorting options: https://data.cityofnewyork.us/view/i4p3-pe6a

    This dataset contains Open Parking and Camera Violations issued by the City of New York. Updates will be applied to this data set on the following schedule:

    New or open tickets will be updated weekly (Sunday). Tickets satisfied will be updated daily (Tuesday through Sunday). NOTE: Summonses that have been written-off are indicated by blank financials.

    Summons images will not be available during scheduled downtime on Sunday - Monday from 1:00 am to 2:30 am and on Sundays from 5:00 am to 10:00 am.

    • Initial dataset loaded 05/14/2016.
  15. h

    DataCurBench

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ai_author, DataCurBench [Dataset]. https://huggingface.co/datasets/anonymousaiauthor/DataCurBench
    Explore at:
    Authors
    ai_author
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📖 Overview

    DataCurBench is a dual-task benchmark suite measuring large language models’ ability to autonomously perform data filtering (selecting high-quality samples) and data cleaning (enhancing linguistic form) for pre-training corpora. It comprises two configurations—data_filtering and data_cleaning—each with English (en) and Chinese (zh) splits. This design helps researchers evaluate LLMs on real-world curation pipelines and pinpoint areas for improvement in end-to-end data… See the full description on the dataset page: https://huggingface.co/datasets/anonymousaiauthor/DataCurBench.

  16. Water Quality Metrics & Filter Performance Dataset

    • kaggle.com
    zip
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SwekeRR (2024). Water Quality Metrics & Filter Performance Dataset [Dataset]. https://www.kaggle.com/datasets/swekerr/water-quality-metrics-and-filter-performance-dataset
    Explore at:
    zip(422849 bytes)Available download formats
    Dataset updated
    Dec 19, 2024
    Authors
    SwekeRR
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Water Quality Metrics and Filter Performance Dataset

    Description

    This dataset provides simulated data on various water quality parameters and their impact on the performance of water filtration systems. The dataset includes 19K+ samples, with attributes such as Total Dissolved Solids (TDS), turbidity, pH, water depth, and flow discharge. These parameters are used to estimate the filter life span (in hours) and filter efficiency (in percentage) under different conditions.

    All the conditions for each feature is based on the data found on the Internet.

    The dataset is ideal for exploring relationships between water quality metrics and filter performance, building predictive models, or conducting data analysis for environmental and engineering studies.

    Features

    • TDS (mg/l): Total Dissolved Solids in milligrams per liter (values < 500 mg/l).
    • Turbidity (NTU): Measurement of water clarity in Nephelometric Turbidity Units (values < 10 NTU).
    • pH: Measure of acidity/alkalinity (range: 6.0 to 8.5).
    • Depth (m): Water depth in meters (range: 0.5 to 5 m).
    • Flow Discharge (L/min): Water flow rate in liters per minute (range: 1 to 100 L/min).
    • Filter Life Span (hours): Estimated lifespan of the filter based on input parameters (minimum value capped at 500 hours).
    • Filter Efficiency (%): Estimated filtration efficiency (minimum value capped at 75%).

    Applications

    • Predicting filter performance based on water quality parameters.
    • Analyzing the impact of water quality on filter lifespan and efficiency.
    • Training machine learning models for environmental monitoring.

    Note: This dataset is entirely synthetic and created for educational and research purposes. It does not represent real-world measurements but can be used to simulate scenarios for water filtration system analysis.

  17. s

    Filter Import Data India – Buyers & Importers List

    • seair.co.in
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim, Filter Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset provided by
    Seair Info Solutions PVT LTD
    Authors
    Seair Exim
    Area covered
    India, United States
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  18. H

    Data from: Estimating Dynamic Models Using Kalman Filtering

    • dataverse.harvard.edu
    Updated Dec 21, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathaniel Beck (2009). Estimating Dynamic Models Using Kalman Filtering [Dataset]. http://doi.org/10.7910/DVN/TRRVNY
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 21, 2009
    Dataset provided by
    Harvard Dataverse
    Authors
    Nathaniel Beck
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Kalman filter is useful to estimate dynamic models via maximum likelihood. To do this the model must be set up in state space form. This article shows how various models of interest can be set up in that form. Models considered are Auto Regressive-Moving Average (ARMA) models with measurement error and dynamic factor models. The filter is used to estimate models of presidential approval. A test of rational expectations in approval shows the hypothesis not to hold. The filter is also used to deal with missing approval data and to study whether interpolation of missing data is an adequate technique. Finally, a dynamic factor analysis of government entrepreneurial activity is performed. Appendices go through the mathematical details of the filter and show how to implement it in the computer l anguage GAUSS.

  19. Ar data filtering

    • kaggle.com
    zip
    Updated Jan 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TW PROJECT (2021). Ar data filtering [Dataset]. https://www.kaggle.com/datasets/twproject/ar-data-deneme/suggestions
    Explore at:
    zip(1171128038 bytes)Available download formats
    Dataset updated
    Jan 19, 2021
    Authors
    TW PROJECT
    Description

    Dataset

    This dataset was created by TW PROJECT

    Contents

  20. d

    Data from: Model Adaptation for Prognostics in a Particle Filtering...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Model Adaptation for Prognostics in a Particle Filtering Framework [Dataset]. https://catalog.data.gov/dataset/model-adaptation-for-prognostics-in-a-particle-filtering-framework
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    One of the key motivating factors for using particle filters for prognostics is the ability to include model parameters as part of the state vector to be estimated. This performs model adaptation in conjunction with state tracking, and thus, produces a tuned model that can used for long term predictions. This feature of particle filters works in most part due to the fact that they are not subject to the “curse of dimensionality”, i.e. the exponential growth of computational complexity with state dimension. However, in practice, this property holds for “well-designed” particle filters only as dimensionality increases. This paper explores the notion of wellness of design in the context of predicting remaining useful life for individual discharge cycles of Li-ion batteries. Prognostic metrics are used to analyze the tradeoff between different model designs and prediction performance. Results demonstrate how sensitivity analysis may be used to arrive at a well- designed prognostic model that can take advantage of the model adaptation properties of a particle filter.*

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Monmouth County NJ GIS (2022). How to download GIS data using filtering tools [Dataset]. https://data-monmouthnj.hub.arcgis.com/documents/82c62feaeca4456e95a2028586af083f

How to download GIS data using filtering tools

Explore at:
Dataset updated
Jul 28, 2022
Dataset authored and provided by
Monmouth County NJ GIS
Description

Esri's ArcGIS Online tools provide three methods of filtering larger datasets using attribute or geospatial information that are a part of each individual dataset. These instructions provide a basic overview of the step a GeoHub end user can take to filter out unnecessary data or to specifically hone in a particular location to find data related to this location and download the specific information filtered through the search bar, as seen on the map or using the attribute filters in the Data tab.

Search
Clear search
Close search
Google apps
Main menu