100+ datasets found
  1. d

    Data from: PADMINI: A PEER-TO-PEER DISTRIBUTED ASTRONOMY DATA MINING SYSTEM...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). PADMINI: A PEER-TO-PEER DISTRIBUTED ASTRONOMY DATA MINING SYSTEM AND A CASE STUDY [Dataset]. https://catalog.data.gov/dataset/padmini-a-peer-to-peer-distributed-astronomy-data-mining-system-and-a-case-study
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    PADMINI: A PEER-TO-PEER DISTRIBUTED ASTRONOMY DATA MINING SYSTEM AND A CASE STUDY TUSHAR MAHULE, KIRK BORNE, SANDIPAN DEY, SUGANDHA ARORA, AND HILLOL KARGUPTA** Abstract. Peer-to-Peer (P2P) networks are appealing for astronomy data mining from virtual observatories because of the large volume of the data, compute-intensive tasks, potentially large number of users, and distributed nature of the data analysis process. This paper offers a brief overview of PADMINI—a Peer-to-Peer Astronomy Data MINIng system. It also presents a case study on PADMINI for distributed outlier detection using astronomy data. PADMINI is a webbased system powered by Google Sky and distributed data mining algorithms that run on a collection of computing nodes. This paper offers a case study of the PADMINI evaluating the architecture and the performance of the overall system. Detailed experimental results are presented in order to document the utility and scalability of the system.

  2. d

    Data from: Data Mining at NASA: From Theory to Applications

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Aug 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Data Mining at NASA: From Theory to Applications [Dataset]. https://catalog.data.gov/dataset/data-mining-at-nasa-from-theory-to-applications
    Explore at:
    Dataset updated
    Aug 23, 2025
    Dataset provided by
    Dashlink
    Description

    NASA has some of the largest and most complex data sources in the world, with data sources ranging from the earth sciences, space sciences, and massive distributed engineering data sets from commercial aircraft and spacecraft. This talk will discuss some of the issues and algorithms developed to analyze and discover patterns in these data sets. We will also provide an overview of a large research program in Integrated Vehicle Health Management. The goal of this program is to develop advanced technologies to automatically detect, diagnose, predict, and mitigate adverse events during the flight of an aircraft. A case study will be presented on a recent data mining analysis performed to support the Flight Readiness Review of the Space Shuttle Mission STS-119.

  3. Data for Case Studies

    • figshare.com
    xlsx
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    H (2025). Data for Case Studies [Dataset]. http://doi.org/10.6084/m9.figshare.29528183.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 10, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    H
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data for case studies.

  4. d

    Data Mining in Systems Health Management

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Data Mining in Systems Health Management [Dataset]. https://catalog.data.gov/dataset/data-mining-in-systems-health-management
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.

  5. Beginner Data Mining Datasets

    • kaggle.com
    zip
    Updated May 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    verdecali (2022). Beginner Data Mining Datasets [Dataset]. https://www.kaggle.com/datasets/verdecali/beginner-data-mining-datasets
    Explore at:
    zip(1672021 bytes)Available download formats
    Dataset updated
    May 28, 2022
    Authors
    verdecali
    Description

    These are artificially made beginner data mining datasets for learning purposes.

    Case study:

    • FEELS LIKE HOME is an interior design company, which has about 100 000 registered customers and provide services for more than 200 000 clients annually.
    • The range of the products can be divided in 5 major classes: Decor accessories, Furniture, Textiles, Lighting and Art with an option to purchase Limited Edition versions for an extra charge. These goods can be distributed by 3 channels: Physical stores, yearly catalogs and the companies’ website.
    • FEELS LIKE HOME has been doing a great job during recent years, achieving decent profits and revenues, but the future remains volatile. In order to solve the problem of instability the company is planning to launch new marketing program, especially to improve the accuracy of marketing campaigns.

    The aim of FeelsLikeHome_Campaign dataset is to create project is in which you build a predictive model (using a sample of 2500 clients’ data) forecasting the highest profit from the next marketing campaign, which will indicate the customers who will be the most likely to accept the offer.

    The aim of FeelsLikeHome_Cluster dataset is to create project in which you split company’s customer base on homogenous clusters (using 5000 clients’ data) and propose draft marketing strategies for these groups based on customer behavior and information about their profile.

    FeelsLikeHome_Score dataset can be used to calculate total profit from marketing campaign and for producing a list of sorted customers by the probability of the dependent variable in predictive model problem.

  6. s

    Citation Trends for "Handling missing values in data mining - A case study...

    • shibatadb.com
    Updated May 15, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2012). Citation Trends for "Handling missing values in data mining - A case study of heart failure dataset" [Dataset]. https://www.shibatadb.com/article/MM55uBht
    Explore at:
    Dataset updated
    May 15, 2012
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2013 - 2022
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "Handling missing values in data mining - A case study of heart failure dataset".

  7. Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open...

    • plos.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Van Poucke; Zhongheng Zhang; Martin Schmitz; Milan Vukicevic; Margot Vander Laenen; Leo Anthony Celi; Cathy De Deyne (2023). Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open Data Analysis Platform [Dataset]. http://doi.org/10.1371/journal.pone.0145791
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sven Van Poucke; Zhongheng Zhang; Martin Schmitz; Milan Vukicevic; Margot Vander Laenen; Leo Anthony Celi; Cathy De Deyne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the accumulation of large amounts of health related data, predictive analytics could stimulate the transformation of reactive medicine towards Predictive, Preventive and Personalized (PPPM) Medicine, ultimately affecting both cost and quality of care. However, high-dimensionality and high-complexity of the data involved, prevents data-driven methods from easy translation into clinically relevant models. Additionally, the application of cutting edge predictive methods and data manipulation require substantial programming skills, limiting its direct exploitation by medical domain experts. This leaves a gap between potential and actual data usage. In this study, the authors address this problem by focusing on open, visual environments, suited to be applied by the medical community. Moreover, we review code free applications of big data technologies. As a showcase, a framework was developed for the meaningful use of data from critical care patients by integrating the MIMIC-II database in a data mining environment (RapidMiner) supporting scalable predictive analytics using visual tools (RapidMiner’s Radoop extension). Guided by the CRoss-Industry Standard Process for Data Mining (CRISP-DM), the ETL process (Extract, Transform, Load) was initiated by retrieving data from the MIMIC-II tables of interest. As use case, correlation of platelet count and ICU survival was quantitatively assessed. Using visual tools for ETL on Hadoop and predictive modeling in RapidMiner, we developed robust processes for automatic building, parameter optimization and evaluation of various predictive models, under different feature selection schemes. Because these processes can be easily adopted in other projects, this environment is attractive for scalable predictive analytics in health research.

  8. s

    Citation Trends for "Data mining approach to monitoring the requirements of...

    • shibatadb.com
    Updated Apr 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2017). Citation Trends for "Data mining approach to monitoring the requirements of the job market: A case study" [Dataset]. https://www.shibatadb.com/article/rYQcEY6o
    Explore at:
    Dataset updated
    Apr 15, 2017
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2017 - 2025
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "Data mining approach to monitoring the requirements of the job market: A case study".

  9. s

    Citation Trends for "The Potential of Text Mining in Data Integration and...

    • shibatadb.com
    Updated Mar 1, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yubetsu (2013). Citation Trends for "The Potential of Text Mining in Data Integration and Network Biology for Plant Research: A Case Study onArabidopsis" [Dataset]. https://www.shibatadb.com/article/byvQHhay
    Explore at:
    Dataset updated
    Mar 1, 2013
    Dataset authored and provided by
    Yubetsu
    License

    https://www.shibatadb.com/license/data/proprietary/v1.0/license.txthttps://www.shibatadb.com/license/data/proprietary/v1.0/license.txt

    Time period covered
    2013 - 2022
    Variables measured
    New Citations per Year
    Description

    Yearly citation counts for the publication titled "The Potential of Text Mining in Data Integration and Network Biology for Plant Research: A Case Study onArabidopsis".

  10. u

    Data from: The use of project portfolios in effective strategy execution to...

    • researchdata.up.ac.za
    zip
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palesa Agnes Ramashala (2023). The use of project portfolios in effective strategy execution to improve business value [Dataset]. http://doi.org/10.25403/UPresearchdata.13280141.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University of Pretoria
    Authors
    Palesa Agnes Ramashala
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Qualitative data gathered from interviews that were conducted with case organisations. The data is analysed using a qualitative data analysis tool (AtlasTi) to code and generate network diagrams. Software such as Atlas.ti 8 Windows will be a great advantage to use in order to view these results. Interviews were conducted with four case organisations. The details of the responses from the respondents from case organisations are captured. The data gathered during the interview sessions is captured in a tabular form and graphs were also created to identify trends. Also in this study is desktop review of the case organisations that formed part of the study. The desktop study was done using published annual reports over a period of more than seven years. The analysis was done given the scope of the project and its constructs.

  11. Data Mining in Systems Health Management - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Data Mining in Systems Health Management - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/data-mining-in-systems-health-management
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.

  12. Data for Case Studies

    • figshare.com
    xlsx
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Derek H (2025). Data for Case Studies [Dataset]. http://doi.org/10.6084/m9.figshare.29526518.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 10, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Derek H
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data for all uncertain parameters.

  13. 99 Little Orange, Technical Business Case

    • kaggle.com
    zip
    Updated Jun 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IVAN CHAVEZ (2022). 99 Little Orange, Technical Business Case [Dataset]. https://www.kaggle.com/datasets/ivanchvez/99littleorange
    Explore at:
    zip(91998345 bytes)Available download formats
    Dataset updated
    Jun 13, 2022
    Authors
    IVAN CHAVEZ
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    99 Little Orange, Technical Business Case

    Dear candidate, we are so excited with your interest in working with us! This challenge is an opportunity for us to know a bit of the great talent we know you have. It was built to simulate real-case scenarios that you would face while working at [Organization] and is organized in 2 parts:

      1. A technical part of close-ended questions with specific answers that are meant to assess your ability to analyze large amounts of data with SQL to answer key questions.
      1. An analytical part of open-ended questions to assess your ability to build data-backed recommendations to support decision-making. Expect further questions and discussions on top of your answers in the next phase of our hiring process.

    Part I - Technical Provide both the answer and the SQL code used. 1. What is the average trip cost of holidays? How does it compare to non-holidays? 2. Find the average call time of the first time passengers make a trip. 3. Find the average number of trips per driver for every week day. 4. Which day of the week drivers usually drive the most distance on average? 5. What was the growth percentage of rides month over month? 6. Optional. List the top 5 drivers per number of trips in the top 5 largest cities.

    Part II - Analytical 99 is a marketplace, where drivers are the supply and passengers, the demand. One of our main challenges is to keep this marketplace balanced. If there's too much demand, prices would increase due to surge and passengers would prefer not to run. If there's too much supply, drivers would spend more time idle impacting their revenue. 1. Let's say it's 2019-09-23 and a new Operations manager for The Shire was just hired. She has 5 minutes during the Ops weekly meeting to present an overview of the business in the city, and since she's just arrived, she asked your help to do it. What would you prepare for this 5 minutes presentation? Please provide 1-2 slides with your idea. 2. She also mentioned she has a budget to invest in promoting the business. What kind of metrics and performance indicators would you use in order to help her decide if she should invest it into the passenger side or the driver side? Extra point if you provide data-backed recommendations. 3. One month later, she comes back, super grateful for all the helpful insights you have given her. And says she is anticipating a driver supply shortage due to a major concert that is going to take place the next day and also a 3 day city holiday that is coming the next month. What would you do to help her analyze the best course of action to either prevent or minimize the problem in each case? 4. Optional. We want to build up a model to predict “Possible Churn Users” (e.g.: no trips in the past 4 weeks). List all features that you can think about and the data mining or machine learning model or other methods you may use for this case.

  14. The coupling values of the classes in our case study.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulrahman Ahmed Bobakr Baqais; Mohammad Alshayeb (2023). The coupling values of the classes in our case study. [Dataset]. http://doi.org/10.1371/journal.pone.0202629.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Abdulrahman Ahmed Bobakr Baqais; Mohammad Alshayeb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The coupling values of the classes in our case study.

  15. g

    Retrospective Mining of Tox Data Anemia Case Study RegToxPharm Data |...

    • gimi9.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Retrospective Mining of Tox Data Anemia Case Study RegToxPharm Data | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_retrospective-mining-of-tox-data-anemia-case-study-regtoxpharm-data/
    Explore at:
    Description

    Data from a study to critically examine some of the issues of using data from ToxRefDB, a database largely composed of guideline studies for pesticidal active ingredients, using a case study focusing on chemically-induced anemia. This dataset is associated with the following publication: Judson, R.S., M. Martin, G. Patlewicz, and C.E. Wood. (Reg. Tox. Pharm.) Retrospective Mining of Toxicology Data to Discover Multispecies and Chemical Class Effects: Anemia as a Case Study. REGULATORY TOXICOLOGY AND PHARMACOLOGY. Elsevier Science Ltd, New York, NY, USA, 86: 74-92, (2017).

  16. Datasets associated with "Mining of Consumer Product and Purchasing Data to...

    • catalog.data.gov
    • datasets.ai
    Updated Jul 26, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Datasets associated with "Mining of Consumer Product and Purchasing Data to Identify Potential Chemical Co-exposures" [Dataset]. https://catalog.data.gov/dataset/datasets-associated-with-mining-of-consumer-product-and-purchasing-data-to-identify-potent
    Explore at:
    Dataset updated
    Jul 26, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Background: Chemicals in consumer products are a major contributor to human chemical co-exposures. Consumers purchase and use a wide variety of products containing potentially thousands of chemicals. There is a need to identify potential real-world chemical co-exposures in order to prioritize in vitro toxicity screening. However, due to the vast number of potential chemical combinations, this has been a major challenge. Objectives: We aim to develop and implement a data-driven procedure for identifying prevalent chemical combinations to which humans are exposed through purchase and use of consumer products. Methods: We applied frequent itemset mining on an integrated dataset linking consumer product chemical ingredient data with product purchasing data from sixty thousand households to identify chemical combinations resulting from co-use of consumer products. Results: We identified co-occurrence patterns of chemicals over all households as well as those specific to demographic groups based on race/ethnicity, income, education, and family composition. We also identified chemicals with the highest potential for aggregate exposure by identifying chemicals occurring in multiple products used by the same household. Lastly, a case study of chemicals active in estrogen and androgen receptor in silico models revealed priority chemical combinations co-targeting receptors involved in important biological signaling pathways. Discussion: Integration and comprehensive analysis of household purchasing data and product-chemical information provided a means to assess human near-field exposure and inform selection of chemical combinations for high-throughput screening in in vitro assays. This dataset is associated with the following publication: Stanfield, Z., C. Addington, K. Dionisio, D. Lyons, R. Tornero-Velez, K. Phillips, T. Buckley, and K. Isaacs. Mining of consumer product and purchasing data to identify potential chemical co-exposures.. ENVIRONMENTAL HEALTH PERSPECTIVES. National Institute of Environmental Health Sciences (NIEHS), Research Triangle Park, NC, USA, 129(6): N/A, (2021).

  17. Results of running KHC on our case study.

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdulrahman Ahmed Bobakr Baqais; Mohammad Alshayeb (2023). Results of running KHC on our case study. [Dataset]. http://doi.org/10.1371/journal.pone.0202629.t011
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Abdulrahman Ahmed Bobakr Baqais; Mohammad Alshayeb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Results of running KHC on our case study.

  18. A summary of related hashtags (top group) and related place mentions (bottom...

    • plos.figshare.com
    xls
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Savelyev; Alan M. MacEachren (2023). A summary of related hashtags (top group) and related place mentions (bottom group) identified with each particular meta-path. [Dataset]. http://doi.org/10.1371/journal.pone.0206906.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Alexander Savelyev; Alan M. MacEachren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A summary of related hashtags (top group) and related place mentions (bottom group) identified with each particular meta-path.

  19. m

    Optimum Layout of Sublevel Stoping Mines (Case Study)

    • data.mendeley.com
    Updated Aug 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Yarmuch (2017). Optimum Layout of Sublevel Stoping Mines (Case Study) [Dataset]. http://doi.org/10.17632/zvtw8zc2p6.1
    Explore at:
    Dataset updated
    Aug 31, 2017
    Authors
    Juan Yarmuch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provide the ".lp" and the ".sol" files for the case study published in: J.L Yarmuch, Grigaliunas M.C., Munizaga-Rosas J.C. "Optimum Layout of Sublevel Stoping Mines"

  20. d

    Data from: Mining Distance-Based Outliers in Near Linear Time

    • catalog.data.gov
    • datasets.ai
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Mining Distance-Based Outliers in Near Linear Time [Dataset]. https://catalog.data.gov/dataset/mining-distance-based-outliers-in-near-linear-time
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dashlink (2025). PADMINI: A PEER-TO-PEER DISTRIBUTED ASTRONOMY DATA MINING SYSTEM AND A CASE STUDY [Dataset]. https://catalog.data.gov/dataset/padmini-a-peer-to-peer-distributed-astronomy-data-mining-system-and-a-case-study

Data from: PADMINI: A PEER-TO-PEER DISTRIBUTED ASTRONOMY DATA MINING SYSTEM AND A CASE STUDY

Related Article
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description

PADMINI: A PEER-TO-PEER DISTRIBUTED ASTRONOMY DATA MINING SYSTEM AND A CASE STUDY TUSHAR MAHULE, KIRK BORNE, SANDIPAN DEY, SUGANDHA ARORA, AND HILLOL KARGUPTA** Abstract. Peer-to-Peer (P2P) networks are appealing for astronomy data mining from virtual observatories because of the large volume of the data, compute-intensive tasks, potentially large number of users, and distributed nature of the data analysis process. This paper offers a brief overview of PADMINI—a Peer-to-Peer Astronomy Data MINIng system. It also presents a case study on PADMINI for distributed outlier detection using astronomy data. PADMINI is a webbased system powered by Google Sky and distributed data mining algorithms that run on a collection of computing nodes. This paper offers a case study of the PADMINI evaluating the architecture and the performance of the overall system. Detailed experimental results are presented in order to document the utility and scalability of the system.

Search
Clear search
Close search
Google apps
Main menu