2 datasets found
  1. d

    Data from: Mining Distance-Based Outliers in Near Linear Time

    • datasets.ai
    • data.nasa.gov
    • +2more
    33
    Updated Aug 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2024). Mining Distance-Based Outliers in Near Linear Time [Dataset]. https://datasets.ai/datasets/mining-distance-based-outliers-in-near-linear-time
    Explore at:
    33Available download formats
    Dataset updated
    Aug 8, 2024
    Dataset authored and provided by
    National Aeronautics and Space Administration
    Description

    Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule

    Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

  2. g

    Mining Distance-Based Outliers in Near Linear Time | gimi9.com

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mining Distance-Based Outliers in Near Linear Time | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_mining-distance-based-outliers-in-near-linear-time/
    Explore at:
    Description

    Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Aeronautics and Space Administration (2024). Mining Distance-Based Outliers in Near Linear Time [Dataset]. https://datasets.ai/datasets/mining-distance-based-outliers-in-near-linear-time

Data from: Mining Distance-Based Outliers in Near Linear Time

Related Article
Explore at:
33Available download formats
Dataset updated
Aug 8, 2024
Dataset authored and provided by
National Aeronautics and Space Administration
Description

Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule

Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

Search
Clear search
Close search
Google apps
Main menu