38 datasets found
  1. Data from: Enriching time series datasets using Nonparametric kernel...

    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamad Ivan Fanany (2023). Enriching time series datasets using Nonparametric kernel regression to improve forecasting accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.1609661.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mohamad Ivan Fanany
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.

  2. Z

    Data Analysis for the Systematic Literature Review of DL4SE

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4768586
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    College of William and Mary
    Washington and Lee University
    Authors
    Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

    The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

    Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

    Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

    Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

    Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

    Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

    We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

    Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

    Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise

  3. c

    Global Data Mining Software Market Report 2025 Edition, Market Size, Share,...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). Global Data Mining Software Market Report 2025 Edition, Market Size, Share, CAGR, Forecast, Revenue [Dataset]. https://www.cognitivemarketresearch.com/data-mining-software-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jun 2, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global Data Mining Software market size will be USD XX million in 2025. It will expand at a compound annual growth rate (CAGR) of XX% from 2025 to 2031.

    North America held the major market share for more than XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Europe accounted for a market share of over XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Asia Pacific held a market share of around XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Latin America had a market share of more than XX% of the global revenue with a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. Middle East and Africa had a market share of around XX% of the global revenue and was estimated at a market size of USD XX million in 2025 and will grow at a CAGR of XX% from 2025 to 2031. KEY DRIVERS

    Increasing Focus on Customer Satisfaction to Drive Data Mining Software Market Growth

    In today’s hyper-competitive and digitally connected marketplace, customer satisfaction has emerged as a critical factor for business sustainability and growth. The growing focus on enhancing customer satisfaction is proving to be a significant driver in the expansion of the data mining software market. Organizations are increasingly leveraging data mining tools to sift through vast volumes of customer data—ranging from transactional records and website activity to social media engagement and call center logs—to uncover insights that directly influence customer experience strategies. Data mining software empowers companies to analyze customer behavior patterns, identify dissatisfaction triggers, and predict future preferences. Through techniques such as classification, clustering, and association rule mining, businesses can break down large datasets to understand what customers want, what they are likely to purchase next, and how they feel about the brand. These insights not only help in refining customer service but also in shaping product development, pricing strategies, and promotional campaigns. For instance, Netflix uses data mining to recommend personalized content by analyzing a user's viewing history, ratings, and preferences. This has led to increased user engagement and retention, highlighting how a deep understanding of customer preferences—made possible through data mining—can translate into competitive advantage. Moreover, companies are increasingly using these tools to create highly targeted and customer-specific marketing campaigns. By mining data from e-commerce transactions, browsing behavior, and demographic profiles, brands can tailor their offerings and communications to suit individual customer segments. For Instance Amazon continuously mines customer purchasing and browsing data to deliver personalized product recommendations, tailored promotions, and timely follow-ups. This not only enhances customer satisfaction but also significantly boosts conversion rates and average order value. According to a report by McKinsey, personalization can deliver five to eight times the ROI on marketing spend and lift sales by 10% or more—a powerful incentive for companies to adopt data mining software as part of their customer experience toolkit. (Source: https://www.mckinsey.com/capabilities/growth-marketing-and-sales/our-insights/personalizing-at-scale#/) The utility of data mining tools extends beyond e-commerce and streaming platforms. In the banking and financial services industry, for example, institutions use data mining to analyze customer feedback, call center transcripts, and usage data to detect pain points and improve service delivery. Bank of America, for instance, utilizes data mining and predictive analytics to monitor customer interactions and provide proactive service suggestions or fraud alerts, significantly improving user satisfaction and trust. (Source: https://futuredigitalfinance.wbresearch.com/blog/bank-of-americas-erica-client-interactions-future-ai-in-banking) Similarly, telecom companies like Vodafone use data mining to understand customer churn behavior and implement retention strategies based on insights drawn from service usage patterns and complaint histories. In addition to p...

  4. Comparison of 14 classifiers

    • figshare.com
    application/gzip
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacques Wainer (2023). Comparison of 14 classifiers [Dataset]. http://doi.org/10.6084/m9.figshare.3407932.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Jacques Wainer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data, programs, results, and analysis software for the paper "Comparison of 14 different families of classification algorithms on 115 binary data sets" https://arxiv.org/abs/1606.00930

  5. Data from: PREDICTION OF RANKING OF LOTS OF CORN SEEDS BY ARTIFICIAL...

    • scielo.figshare.com
    tiff
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gizele I. Gadotti; Nicacia A. B. Moraes; Joseano G. da Silva; Romário de M. Pinheiro; Rita de C. M. Monteiro (2023). PREDICTION OF RANKING OF LOTS OF CORN SEEDS BY ARTIFICIAL INTELLIGENCE [Dataset]. http://doi.org/10.6084/m9.figshare.20551630.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Gizele I. Gadotti; Nicacia A. B. Moraes; Joseano G. da Silva; Romário de M. Pinheiro; Rita de C. M. Monteiro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT The seed sector faces several challenges when it comes to ensuring a quick and accurate decision making when working with large amounts of data on physiological quality of seed lots, which makes the process time-consuming and inefficient. Thus, artificial intelligence (AI) emerges as a new technological option in the seed sector to solve database problems in the post-harvest stages. This study aims to use machine learning to classify maize seed lots. Data were obtained from eight maize seed crops from a private company. These data were mined using the following classifiers: J48 (DecisionTree), RandomForest, CVR (ClassificationViaRegression ) , lBk (lazy.IBK), MLP (MultiLayerPercepton), and NäiveBayes. Cross-validation was used for data measurement, with the data set, including training and testing data, being divided into 10 subsets. The described steps were performed using the Weka software. It is concluded that results obtained allow the classification of maize seed lots with high accuracy and precision, and these algorithms can better classify the maize seed lot through vigor attributes, thus enabling more accurate decision making based on vigor tests on a reduced evaluation time.

  6. Z

    Data from: QuerTCI: A Tool Integrating GitHub Issue Querying with Comment...

    • data.niaid.nih.gov
    Updated Feb 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ye Paing; Tatiana Castro Vélez; Raffi Khatchadourian (2022). QuerTCI: A Tool Integrating GitHub Issue Querying with Comment Classification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6115403
    Explore at:
    Dataset updated
    Feb 21, 2022
    Dataset provided by
    City University of New York (CUNY) Hunter College
    City University of New York (CUNY) Graduate Center
    Authors
    Ye Paing; Tatiana Castro Vélez; Raffi Khatchadourian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Issue tracking systems enable users and developers to comment on problems plaguing a software system. Empirical Software Engineering (ESE) researchers study (open-source) project issues and the comments and threads within to discover---among others---challenges developers face when, e.g., incorporating new technologies, platforms, and programming language constructs. However, issue discussion threads accumulate over time and thus can become unwieldy, hindering any insight that researchers may gain. While existing approaches alleviate this burden by classifying issue thread comments, there is a gap between searching popular open-source software repositories (e.g., those on GitHub) for issues containing particular keywords and feeding the results into a classification model. In this paper, we demonstrate a research infrastructure tool called QuerTCI that bridges this gap by integrating the GitHub issue comment search API with the classification models found in existing approaches. Using queries, ESE researchers can retrieve GitHub issues containing particular keywords, e.g., those related to a certain programming language construct, and subsequently classify the kinds of discussions occurring in those issues. Using our tool, our hope is that ESE researchers can uncover challenges related to particular technologies using certain keywords through popular open-source repositories more seamlessly than previously possible. A tool demonstration video may be found at: https://youtu.be/fADKSxn0QUk.

  7. w

    Global Text Mining Software Market Research Report: By Application...

    • wiseguyreports.com
    Updated Sep 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Text Mining Software Market Research Report: By Application (Sentiment Analysis, Content Classification, Information Extraction, Text Categorization, Topic Modeling), By Deployment Type (On-premise, Cloud-based, Hybrid), By End User (Healthcare, Retail, Education, Finance, Government), By Organization Size (Small Enterprises, Medium Enterprises, Large Enterprises), By Output Format (Structured Data, Unstructured Data, Visualization Reports) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/text-mining-software-market
    Explore at:
    Dataset updated
    Sep 15, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20242.93(USD Billion)
    MARKET SIZE 20253.22(USD Billion)
    MARKET SIZE 20358.5(USD Billion)
    SEGMENTS COVEREDApplication, Deployment Type, End User, Organization Size, Output Format, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSgrowing data volume, rising demand for insights, advancements in natural language processing, increasing adoption of AI technologies, need for competitive intelligence
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDRapidMiner, IBM, Clarabridge, Lexalytics, Oracle, Tableau, Dell Technologies, Information Builders, SAP, MonkeyLearn, Microsoft, Talend, TIBCO Software, SAS Institute, Alteryx, Qlik
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESIncreased demand for data analytics, Integration with artificial intelligence, Growth in social media monitoring, Expansion in healthcare applications, Rising need for consumer sentiment analysis
    COMPOUND ANNUAL GROWTH RATE (CAGR) 10.2% (2025 - 2035)
  8. Comparative Analysis of Artificial Hydrocarbon Networks an Data-Driven...

    • figshare.com
    html
    Updated Dec 31, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LUIS MIRALLES (2016). Comparative Analysis of Artificial Hydrocarbon Networks an Data-Driven Approaches for Human Activity Recognition [Dataset]. http://doi.org/10.6084/m9.figshare.4508744.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Dec 31, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    LUIS MIRALLES
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years computing and sensing technologies advances contribute to develop effective human activity recognition systems. In context-aware and ambient assistive living applications, classification of body postures and movements, aids in the development of health systems that improve the quality of life of the disabled and the elderly. In this paper we describe a comparative analysis of data-driven activity recognition techniques against a novel supervised learning technique called artificial hydrocarbon networks (AHN). We prove that artificial hydrocarbon networks are suitable for efficient body postures and movements classification, providing a comparison between its performance and other well-known supervised learning methods.

  9. r

    Data from: Classifying microarray cancer datasets using nearest subspace...

    • researchdata.edu.au
    • bridges.monash.edu
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael C. Cohen; Kuldip K. Paliwal (2022). Classifying microarray cancer datasets using nearest subspace classification [Dataset]. http://doi.org/10.4225/03/5a13727393276
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Monash University
    Authors
    Michael C. Cohen; Kuldip K. Paliwal
    Description

    In this paper we implement and test the recently described nearest subspace classifier on a range of microarray cancer datasets. Its classification accuracy is tested against nearest neighbor and nearest centroid algorithms, and is shown to give a significant improvement. This classification system uses class-dependent PCA to construct a subspace for each class. Test vectors are assigned the class label of the nearest subspace, which is defined as the minimum reconstruction error across all subspaces. Furthermore, we demonstrate this distance measure is equivalent to the null-space component of the vector being analyzed. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

    Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

  10. m

    Data from: Prediction of paediatric asthma hospitalisation using data mining...

    • bridges.monash.edu
    pdf
    Updated Nov 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schmidt, Sam; Gang Li; Yi-Ping Phoebe Chen (2017). Prediction of paediatric asthma hospitalisation using data mining techniques [Dataset]. http://doi.org/10.4225/03/5a1372a1685b1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 21, 2017
    Dataset provided by
    Monash University
    Authors
    Schmidt, Sam; Gang Li; Yi-Ping Phoebe Chen
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    Research into the prevalence of hospitalisation among childhood asthma cases is undertaken, using a data set local to the Barwon region of Victoria. Participants were the parents/guardians on behalf of children aged between 5-11 years. Various data mining techniques are used, including segmentation, association and classification to assist in predicting and exploring the instances of childhood hospitalisation due to asthma. Results from this study indicate that children in inner city and metropolitan areas may overutilise emergency department services. In addition, this study found that the prediction of hospitalisaion for asthma in children was greater for those with a written asthma management plan. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

    Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

  11. m

    Data from: The strong convergence of visual classification method and its...

    • bridges.monash.edu
    pdf
    Updated Nov 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meng, Deyu; Xu, Zongben; Leung, Yee; Fung, Tung (2017). The strong convergence of visual classification method and its applications in disease diagnosis [Dataset]. http://doi.org/10.4225/03/5a1371f709257
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 21, 2017
    Dataset provided by
    Monash University
    Authors
    Meng, Deyu; Xu, Zongben; Leung, Yee; Fung, Tung
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    Visual classification method is introduced as a learning strategy for pattern classification problem in bioinformatics. In this paper, we show the strong convergence property of the proposed method. In particular, the method is shown to converge to the Bayes estimator, i.e., the learning error of the method tends to achieve the posterior expected minimal value. The method is successfully applied to some practical disease diagnosis problems. The experimental results all verify the validity and effectiveness of the theoretical conclusions. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

    Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

  12. r

    Novel classification scheme for temporal genomic and proteomic problems

    • researchdata.edu.au
    • bridges.monash.edu
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Kocbek; Gregor Stiglic; Mateja Verlic; Peter Kokol (2022). Novel classification scheme for temporal genomic and proteomic problems [Dataset]. http://doi.org/10.4225/03/5a1373171c74f
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Monash University
    Authors
    Simon Kocbek; Gregor Stiglic; Mateja Verlic; Peter Kokol
    Description

    For over a decade genomic and proteomic datasets present a challenge for various statistical and machine learning methods. Most of microarray or mass spectrometry based datasets consist of a small number of samples with a large number of gene or protein expression measurements, but in the past few years new types of datasets with an additional time component are becoming available. This type of datasets offer new opportunities for development of new classification and gene selection techniques where one of the problems is the reduction of high-dimensionality. This paper presents a novel classification technique which combines feature extraction and feature selection to obtain the optimal set of genes available to a classifier. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

    Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

  13. r

    Data from: Gene expression analysis for tumor classification using vector...

    • researchdata.edu.au
    • bridges.monash.edu
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edna Márquez; Ana María Espinosa; Jaime Berumen; Christian Lemaitre (2022). Gene expression analysis for tumor classification using vector quantization [Dataset]. http://doi.org/10.4225/03/5a137205bd04a
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Monash University
    Authors
    Edna Márquez; Ana María Espinosa; Jaime Berumen; Christian Lemaitre
    Description

    Gene expression analysis is one of the most important tasks for genomic medicine, using these it is possible to classify tumors, which are directly related with the development of cancer. This paper presents a clustering method for tumor classification, vector quantization, using gene expression profiles from microarrays of mRNA with samples of cervical cancer and normal cervix. Vector quantization is used to divide the space into regions, and the centroids of the regions represent patients with tumors or healthy ones. Also the regions found by the vector quantizer are used as the base for classifying other tumors, that could help in the prognostics of the illness or for finding new groups of tumors. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

    Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

  14. d

    LANDFIRE.HI_120ESP

    • catalog.data.gov
    Updated Nov 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2021). LANDFIRE.HI_120ESP [Dataset]. https://catalog.data.gov/dataset/landfire-hi-120esp
    Explore at:
    Dataset updated
    Nov 11, 2021
    Dataset provided by
    U.S. Geological Survey
    Description

    The LANDFIRE vegetation layers describe the following elements of existing and potential vegetation for each LANDFIRE mapping zone: environmental site potentials, biophysical settings, existing vegetation types, canopy cover, and vegetation height. Vegetation is mapped using predictive landscape models based on extensive field reference data, satellite imagery, biophysical gradient layers, and classification and regression trees. DATA SUMMARY: The environmental site potential (ESP) data layer represents the vegetation that could be supported at a given site based on the biophysical environment. Map units are named according to NatureServe's Ecological Systems classification, which is a nationally consistent set of mid-scale ecological units (Comer and others 2003). Usage of these classification units to describe environmental site potential, however, differs from the original intent of Ecological Systems as units of existing vegetation. As used in LANDFIRE, map unit names represent the natural plant communities that would become established at late or climax stages of successional development in the absence of disturbance. They reflect the current climate and physical environment, as well as the competitive potential of native plant species. The ESP layer is similar in concept to other approaches to classifying potential vegetation in the western United States, including habitat types (for example, Daubenmire 1968 and Pfister and others 1977) and plant associations (for example, Henderson and others 1989). It is important to note that ESP is an abstract concept and represents neither current nor historical vegetation. To create the ESP data layer, we first assign field plots to one of the ESP map unit classes. Go to http://www.landfire.gov/participate_acknowledgements.php for more information regarding contributors of field plot data. Assignments are based on presence and abundance of indicator plant species recorded on the plots and on the ecological amplitude and competitive potential of these species. We then intersect plot locations with a series of 30-meter spatially explicit gradient layers. Most of the gradient layers used in the predictive modeling of ESP are derived using the WX-BGC simulation model (Keane and Holsinger, in preparation; Keane and others 2002). WX-BGC simulations are based largely on spatially extrapolated weather data from DAYMET (Thornton and others 1997; Thornton and Running 1999; http://www.daymet.org/ ) and on soils data in STATSGO (NRCS 1994). Additional indirect gradient layers, such as elevation, slope, and indices of topographic position, are also used. We use data from plot locations to develop predictive classification tree models, using See5 data mining software (Quinlan 1993; Rulequest Research 1997), for each LANDFIRE map zone. These decision trees are applied spatially to predict the ESP for every pixel across the landscape. Finally, ESP pixel values are, in some cases, modified based on a comparison with the LANDFIRE existing vegetation type (EVT) layer created with the use of 30-meter Landsat ETM satellite imagery. We make such modifications only in non-vegetated areas (such as water, rock, snow, or ice) and where information in the EVT layer clearly enables a better depiction of the environmental site potential concept. Although the ESP data layer is intended to represent current site potential, the actual time period for this data set is variable. The weather data used in DAYMET were compiled from 1980 to 1997. Refer to spatial metadata for date ranges of field plot data and satellite imagery for each LANDFIRE map zone. A number of changes were implemented for the LF2010 ESP product that worked with this original data. LF2010 updates to mapping EVT map units for Barren, Snow-Ice, and Water were translated to the LF2010 ESP product so those map units will coincide with the EVT. Subsequent to that, each ESP map unit was stratified spatially two different ways. First, each ESP map unit was stratified by LANDFIRE map zone. Second, each ESP map unit was stratified by an ESP life form classification layer that incorporated NLCD 2001 data, LF2001 EVC data, a Vegetation Change Tracker (VCT) dataset (Huang, 2010), and the National Wetlands Inventory (NWI) data. Each layer was leveraged against each other to determine areas of stable Sparse, Upland Herb, Upland Shrub, Upland Woodland, Upland Forest, Wetland Shrub-herb, Wetland Forest, Wetland Shrub, and Wetland Herb. Areas mapped as agriculture, urban, barren, snow-ice, and water were described as Undetermined.

  15. r

    Data from: A novel protein motif finding algorithm for classification of the...

    • researchdata.edu.au
    • bridges.monash.edu
    Updated May 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deng-Kuan Sun; Tong-Liang Zhang; Yong-Sheng Ding (2022). A novel protein motif finding algorithm for classification of the ligase subfamilies [Dataset]. http://doi.org/10.4225/03/5a1371c69c0e3
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Monash University
    Authors
    Deng-Kuan Sun; Tong-Liang Zhang; Yong-Sheng Ding
    Description

    The algorithm of extracting motifs from a family or subfamily is still a hot spot in bioinformatics. It not only contributes to understand functions of proteins and predicts the classification which a unknown protein sequence belongs to, but also helps to study the protein-protein interaction. In this paper, we present a novel algorithm to extract motifs of a subfamily, which is based on feature selection and position connection. Position connection is applied to generate motifs, which is the hybrid method with mechanism of vote decision-making to construct the classifier of the ligase subfamilies. Through testing in the database, more than 95.87% predictive accuracy is achieved. The result demonstrates that this novel method is practical. In addition, the method illuminates that motifs play an important role to classify proteins and research the characteristics of the subfamilies or families of protein database. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

    Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

  16. f

    Classification for unweighted and weighted data.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jens Keilwagen; Ivo Grosse; Jan Grau (2023). Classification for unweighted and weighted data. [Dataset]. http://doi.org/10.1371/journal.pone.0092209.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Jens Keilwagen; Ivo Grosse; Jan Grau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The entries of a confusion matrix have been calculated for a classification threshold of 1.5. In case of unweighted data, the class label is if and otherwise .

  17. m

    Microarray time-series data classification via multiple alignment of gene...

    • bridges.monash.edu
    • researchdata.edu.au
    pdf
    Updated Nov 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bari, Ataul; Rueda, Luis; Ngom, Alioune (2017). Microarray time-series data classification via multiple alignment of gene expression profiles [Dataset]. http://doi.org/10.4225/03/5a1371a04a06e
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 21, 2017
    Dataset provided by
    Monash University
    Authors
    Bari, Ataul; Rueda, Luis; Ngom, Alioune
    License

    http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/

    Description

    Pairwise alignment approaches for time-varying gene expression profiles have been recently developed for the detection of co-expressions in time-series microarray data sets. In this paper, we analyze multiple expression profile alignment (MEPA) methods for classifying microarray time-course data. We apply a nearest centroid classification technique, in which the centroid of each class is computed by means of a MEPA algorithm. MEPA aligns the expression profiles in such a way to minimize the total area between all aligned profiles. We propose four MEPA approaches whose effectiveness are demonstrated on the well-known budding yeast, S. cerevisiae, data set. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1

    Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.

  18. f

    Data from: QSAR-Co: An Open Source Software for Developing Robust...

    • datasetcatalog.nlm.nih.gov
    • acs.figshare.com
    Updated Nov 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cordeiro, M. Natália D. S.; Ambure, Pravin; Halder, Amit Kumar; Díaz, Humbert González (2020). QSAR-Co: An Open Source Software for Developing Robust Multitasking or Multitarget Classification-Based QSAR Models [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000528917
    Explore at:
    Dataset updated
    Nov 25, 2020
    Authors
    Cordeiro, M. Natália D. S.; Ambure, Pravin; Halder, Amit Kumar; Díaz, Humbert González
    Description

    Quantitative structure–activity relationships (QSAR) modeling is a well-known computational technique with wide applications in fields such as drug design, toxicity predictions, nanomaterials, etc. However, QSAR researchers still face certain problems to develop robust classification-based QSAR models, especially while handling response data pertaining to diverse experimental and/or theoretical conditions. In the present work, we have developed an open source standalone software “QSAR-Co” (available to download at https://sites.google.com/view/qsar-co) to setup classification-based QSAR models that allow mining the response data coming from multiple conditions. The software comprises two modules: (1) the Model development module and (2) the Screen/Predict module. This user-friendly software provides several functionalities required for developing a robust multitasking or multitarget classification-based QSAR model using linear discriminant analysis or random forest techniques, with appropriate validation, following the principles set by the Organisation for Economic Co-operation and Development (OECD) for applying QSAR models in regulatory assessments.

  19. D

    Continuous Road Edge Case Mining Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Continuous Road Edge Case Mining Market Research Report 2033 [Dataset]. https://dataintelo.com/report/continuous-road-edge-case-mining-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Continuous Road Edge Case Mining Market Outlook



    According to our latest research, the global Continuous Road Edge Case Mining market size reached USD 1.16 billion in 2024, driven by the accelerating adoption of advanced analytics and artificial intelligence in automotive and transportation sectors. The market is expected to grow at a robust CAGR of 17.8% during the forecast period, reaching an estimated USD 5.18 billion by 2033. This significant growth is underpinned by the rising demand for enhanced road safety, the proliferation of autonomous vehicles, and the increasing integration of real-time data analytics in traffic management systems.



    One of the primary growth factors for the Continuous Road Edge Case Mining market is the rapid advancement in autonomous vehicle technologies. As automotive OEMs and technology companies race to develop fully autonomous vehicles, the need for comprehensive edge case mining solutions becomes paramount. Edge cases—rare or unusual scenarios encountered on the road—pose significant challenges for the safe deployment of autonomous vehicles. Continuous road edge case mining leverages machine learning and big data analytics to identify, catalog, and address these scenarios, ensuring that vehicles can safely navigate even the most unpredictable conditions. This not only enhances the safety and reliability of autonomous vehicles but also accelerates their path to commercial deployment.



    Another critical driver is the increasing emphasis on road safety and regulatory compliance. Governments and transportation agencies worldwide are mandating stricter safety standards for both autonomous and human-driven vehicles. Continuous road edge case mining enables organizations to proactively detect potential hazards and anomalies in real-world driving environments, facilitating timely interventions and policy adjustments. By systematically analyzing vast amounts of driving data, these solutions help stakeholders reduce accident rates, improve traffic flow, and ensure compliance with evolving safety regulations. The growing collaboration between public agencies and private sector innovators is further fueling the adoption of these technologies.



    The proliferation of connected infrastructure and the rise of smart cities are also propelling the growth of the Continuous Road Edge Case Mining market. With the deployment of IoT sensors, high-definition cameras, and connected traffic management systems, unprecedented volumes of real-time data are being generated. Continuous edge case mining systems can harness this data to provide actionable insights for urban planners, traffic authorities, and automotive manufacturers. The integration of these solutions into smart city initiatives is enabling more efficient traffic management, reducing congestion, and enhancing overall urban mobility. This trend is particularly pronounced in regions with significant investments in digital infrastructure, such as North America, Europe, and Asia Pacific.



    From a regional perspective, North America currently leads the global market, accounting for the largest share in 2024, followed closely by Europe and Asia Pacific. The region’s dominance is attributed to the early adoption of autonomous vehicle technologies, a robust ecosystem of technology providers, and supportive regulatory frameworks. Meanwhile, Asia Pacific is emerging as the fastest-growing market, driven by rapid urbanization, increasing investments in smart transportation, and the presence of leading automotive manufacturers. Europe continues to make significant strides, propelled by stringent safety regulations and a strong focus on innovation in mobility solutions.



    Component Analysis



    The Component segment of the Continuous Road Edge Case Mining market is broadly categorized into Software, Hardware, and Services. Each component plays a vital role in the overall ecosystem, contributing to the efficiency and effectiveness of edge case mining solutions. Software solutions form the backbone of the market, encompassing advanced analytics platforms, machine learning algorithms, and data visualization tools. These software solutions enable the automated identification and classification of edge cases from vast datasets, facilitating continuous improvement in vehicle safety and performance. The demand for customizable and scalable software platforms is on the rise, as organizations seek to tailor solutions to their specific operational needs.



    Hardwar

  20. Data from: Unobtrusive Mattress-based Identification of Hypertension by...

    • commons.datacite.org
    • figshare.com
    Updated Jan 16, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fan Liu Fan Liu (2019). Unobtrusive Mattress-based Identification of Hypertension by Integrating Classification and Association Rule Mining [Dataset]. http://doi.org/10.6084/m9.figshare.7594433.v1
    Explore at:
    Dataset updated
    Jan 16, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    DataCitehttps://www.datacite.org/
    Authors
    Fan Liu Fan Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    It contans 128 BCG recordings (61 hypertensive and 67 normotensive), and the software code of association classifier.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mohamad Ivan Fanany (2023). Enriching time series datasets using Nonparametric kernel regression to improve forecasting accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.1609661.v1
Organization logo

Data from: Enriching time series datasets using Nonparametric kernel regression to improve forecasting accuracy

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Mohamad Ivan Fanany
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.

Search
Clear search
Close search
Google apps
Main menu