30 datasets found
  1. Overall comparison of proposed enhanced DBSCAN with other variants of...

    • plos.figshare.com
    xls
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana (2024). Overall comparison of proposed enhanced DBSCAN with other variants of DBSCAN. [Dataset]. http://doi.org/10.1371/journal.pone.0313890.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overall comparison of proposed enhanced DBSCAN with other variants of DBSCAN.

  2. List of augmentations selected.

    • plos.figshare.com
    xls
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana (2024). List of augmentations selected. [Dataset]. http://doi.org/10.1371/journal.pone.0313890.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Perimeter Intrusion Detection Systems (PIDS) are crucial for protecting any physical locations by detecting and responding to intrusions around its perimeter. Despite the availability of several PIDS, challenges remain in detection accuracy and precise activity classification. To address these challenges, a new machine learning model is developed. This model utilizes the pre-trained InceptionV3 for feature extraction on PID intrusion image dataset, followed by t-SNE for dimensionality reduction and subsequent clustering. When handling high-dimensional data, the existing Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm faces efficiency issues due to its complexity and varying densities. To overcome these limitations, this research enhances the traditional DBSCAN algorithm. In the enhanced DBSCAN, distances between minimal points are determined using an estimation for the epsilon values with the Manhattan distance formula. The effectiveness of the proposed model is evaluated by comparing it to state-of-the-art techniques found in the literature. The analysis reveals that the proposed model achieved a silhouette score of 0.86, while comparative techniques failed to produce similar results. This research contributes to societal security by improving location perimeter protection, and future researchers can utilize the developed model for human activity recognition from image datasets.

  3. Presents the comparison of various density based algorithms.

    • plos.figshare.com
    xls
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana (2024). Presents the comparison of various density based algorithms. [Dataset]. http://doi.org/10.1371/journal.pone.0313890.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Presents the comparison of various density based algorithms.

  4. Details of overall clustering results.

    • plos.figshare.com
    xls
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana (2024). Details of overall clustering results. [Dataset]. http://doi.org/10.1371/journal.pone.0313890.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Perimeter Intrusion Detection Systems (PIDS) are crucial for protecting any physical locations by detecting and responding to intrusions around its perimeter. Despite the availability of several PIDS, challenges remain in detection accuracy and precise activity classification. To address these challenges, a new machine learning model is developed. This model utilizes the pre-trained InceptionV3 for feature extraction on PID intrusion image dataset, followed by t-SNE for dimensionality reduction and subsequent clustering. When handling high-dimensional data, the existing Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm faces efficiency issues due to its complexity and varying densities. To overcome these limitations, this research enhances the traditional DBSCAN algorithm. In the enhanced DBSCAN, distances between minimal points are determined using an estimation for the epsilon values with the Manhattan distance formula. The effectiveness of the proposed model is evaluated by comparing it to state-of-the-art techniques found in the literature. The analysis reveals that the proposed model achieved a silhouette score of 0.86, while comparative techniques failed to produce similar results. This research contributes to societal security by improving location perimeter protection, and future researchers can utilize the developed model for human activity recognition from image datasets.

  5. Details of PID image dataset.

    • plos.figshare.com
    xls
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana (2024). Details of PID image dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0313890.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 19, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Perimeter Intrusion Detection Systems (PIDS) are crucial for protecting any physical locations by detecting and responding to intrusions around its perimeter. Despite the availability of several PIDS, challenges remain in detection accuracy and precise activity classification. To address these challenges, a new machine learning model is developed. This model utilizes the pre-trained InceptionV3 for feature extraction on PID intrusion image dataset, followed by t-SNE for dimensionality reduction and subsequent clustering. When handling high-dimensional data, the existing Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm faces efficiency issues due to its complexity and varying densities. To overcome these limitations, this research enhances the traditional DBSCAN algorithm. In the enhanced DBSCAN, distances between minimal points are determined using an estimation for the epsilon values with the Manhattan distance formula. The effectiveness of the proposed model is evaluated by comparing it to state-of-the-art techniques found in the literature. The analysis reveals that the proposed model achieved a silhouette score of 0.86, while comparative techniques failed to produce similar results. This research contributes to societal security by improving location perimeter protection, and future researchers can utilize the developed model for human activity recognition from image datasets.

  6. e

    Research on ML algorithms for segmenting shopping data

    • data.europa.eu
    csv
    Updated May 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagra Technology Sp. z o.o. (2024). Research on ML algorithms for segmenting shopping data [Dataset]. https://data.europa.eu/88u/dataset/https-dane-gov-pl-pl-dataset-3876-badanie-algorytmow-ml-do-segmentacji-danych-zakupowych
    Explore at:
    csv(24357873), csv(21751149), csv(2037462), csv(2307621), csv(21780966), csv(11279426), csv(2165675), csv(13103270), csv(21947351), csv(2163673), csv(2133887), csv(21834776), csv(2171781), csv(12183338), csv(2079060), csv(21850044), csv(21919401), csv(2169502), csv(11099746), csv(11134176), csv(11001652), csv(2121495), csv(21791565), csv(22671829), csv(2072106), csv(2198844), csv(2587423), csv(21888436), csv(12359362), csv(2228365), csv(22132438), csv(11898029), csv(10998652), csv(23233384), csv(2217607), csv(505993), csv(22214299), csv(13895), csv(1092696), csv(23059320), csv(11238895), csv(486104), csv(501873), csv(2095025), csv(22161459), csv(2572208), csv(11352955), csv(12425979), csv(3707), csv(2033434), csv(544611), csv(22666649), csv(23537378), csv(550130), csv(448), csv(182577), csv(12414118), csv(52371), csv(11150248), csv(2062717), csv(679), csv(12171832), csv(543058), csv(23041832), csv(21777660), csv(1989010), csv(2142), csv(11083875), csv(22157232), csv(24630180), csv(2230670), csv(2169679), csv(23198829), csv(23488373), csv(11242056), csv(11082048), csv(2266020), csv(2287473), csv(11342792), csv(2101254), csv(11324641), csv(3773), csv(2027648), csv(11113518), csv(21724494), csv(11539986), csv(2148557), csv(22169150), csv(11123166), csv(11787753), csv(12299375), csv(22698152), csv(2487977), csv(22880446), csv(11243287), csv(25716786), csv(24586545), csv(11715401), csv(12075014), csv(2088122), csv(2418027), csv(22113945), csv(22878716), csv(12403453), csv(11561816), csv(11791180), csv(11372714), csv(4422), csv(2049625), csv(21781002), csv(11197229), csv(23311798), csv(11306421), csv(2333041), csv(2095223), csv(11816603), csv(2177576), csv(2084815), csv(21991463), csv(2181619), csv(11323880), csv(23073457), csv(607855)Available download formats
    Dataset updated
    May 16, 2024
    Dataset authored and provided by
    Sagra Technology Sp. z o.o.
    Description

    The dataset contains research results of available ML algorithms for analyzing shopping data from retail outlets (data by product (SKU)) as part of the project "Research and development work on the creation of a platform with a built-in AI/ML engine, addressed to participants of the distribution chain of the FMCG and Consumer Health markets".

    The study consisted of checking the effectiveness of the algorithms depending on the segmentation parameters used, e.g. the number of SKUs in the purchasing data, the assumed number of purchasing patterns, data processing time and the number of segments in the resulting data.

    Synthetic shopping data in many configurations (input parameters) was used for the study.

    Included resources include:

    - input data to the segmentation process

    - segmentation result data along with a comparison of the effectiveness of the algorithms

    The file structure is described in additional documents.

    The published research results are for one data set for each combination: number of SKUs/number of concepts.

    Two factorization methods were tested in TRL3, the remaining one was found to be more effective.

    Details of research on individual TRLs:

    - TRL3: results using LDA and NMF factoring + study of algorithms: hc algorithm (ALG1) and kmeans algorithm (ALG2), clique algorithm (ALG3), DBScan algorithm (ALG4) and APC - Affinity Propagation algorithm (ALG5)

    - TRL4: results using NMF factoring + study of algorithms: hc algorithm (ALG1) and kmeans algorithm (ALG2)

    - TRL5: results using NMF factoring + study of algorithms: hc algorithm (ALG1) and kmeans algorithm (ALG2)

    - TRL6: results using real data and NMF factoring + study of algorithms: hc algorithm (ALG1), kmeans algorithm (ALG2) and clique algorithm (ALG3)

    Summary:

    1. Synthetic data sets (number of concepts: 2÷10) were subjected to factorization processes using the NMF algorithm and segmentation using the KMeans algorithm and hierarchical Agglomerative Clustering, as well as the proprietary click algorithm.
    2. The AI/ML Platform's efficiency for segmentation was estimated in the context of processing synthetic data with different numbers of SKUs (Stock Keeping Units), i.e. 15, 90, 540 and 1080.
    3. The processing process of each synthetic data set was precisely measured, i.e. information about the processed data set, information about process parameters and values ​​of measures evaluating the segmentation stage were collected. For this purpose, a module monitoring the processing time was configured and launched.
    4. The analysis of the data processing time by the AI/ML Platform showed a clear dependence on the number of SKUs, i.e. on the size of the product portfolio. The broader the product portfolio, the longer the time required for data processing. This conclusion highlights that organizations with a broader product range can expect longer data processing times.

    5. Factoring and segmentation performed on real data required the development of a segmentation solution operating in an unsupervised learning regime.

    6. The test results showed that the Agglomerative Clustering algorithm for data segmentation had a longer processing time compared to the KMeans algorithm.

    7. The KMeans algorithm and the Agglomerative Clustering algorithm in combination with NMF factorization achieved similar results in segmentation quality measures on synthetic data.

    8. In the tests with synthetic data, the author's clique algorithm was also included, which prioritizes high segment homogeneity. However, this algorithm obtained a lower value of the corrected Rand Index than the KMeans and Agglomerative Clustering algorithms, but this applies to synthetic data. Therefore, the decision was made to conduct additional tests in TRL VI using the clique algorithm on real data. As a result of this activity, it was confirmed that the click algorithm worked well on real data and proved to be more effective for this type of data (PSD purchasing data), where the primary business goal was to achieve high homogeneity of segments.

  7. Analysing Second Hand Car Sales Data

    • kaggle.com
    zip
    Updated May 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Anas Sarwar (2024). Analysing Second Hand Car Sales Data [Dataset]. https://www.kaggle.com/datasets/devantltd/analysing-second-hand-car-sales-data
    Explore at:
    zip(1024089 bytes)Available download formats
    Dataset updated
    May 9, 2024
    Authors
    Muhammad Anas Sarwar
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Analysing Second Hand Car Sales Data with Supervised and Unsupervised Learning Models

    The second-hand cars market is a dynamic and very complex sphere which is impacted by different criteria among them - manufacturer, model, engine specification, and fuel consumption, year of production, mileage, and price. In this exercise, we will look through mock data that contains facts on sale of second-or-used cars in UK. The data is made up of 50,000 different records that describe a transaction of a car sale singularly. Through the utilization of supervised learning and unsupervised learning, we plan to perform an analysis of the dataset. This analysis will facilitate car price prediction via a regression model, as well as cluster pattern identification.

    Single Numerical Input Feature Regression Models We started our work by using the regression model predicting the car price for each numerical input factor like the mileage, a size of the vehicle etc. This is followed by analyzing the associations over variables such as the car's price and numerical factors like the engine size, the vehicle model year, and mileage. The engine size was found to be the variable having the strongest relation to the auto price, which provided evidence that it is the most powerful driver. While a linear model was appropriate for the year of manufacture, other features that were more complicated like engine size needed a non-linear model in order for their interactions and price fluctuations to be accurately detected.

    Multiple Numerical Input Feature Regression Models The analysis was further expanded by incorporating several numeric input parameters while estimating the accuracy of the price predictions. What we reasonably benefited from the usage of extra usages like year of making a car and a number of its kilometers achievement was an improvement of predictive performance in comparison with single-input features models. This holistic approach of studying the many variables that influence car's prices has brought the importance to a limelight of using predictive models by considering many factors simultaneously.

    ** Regression Model with Categorical Variables** To expand our prediction models, we took categorical variables into account and added attributes of manufacturer and model into the regression. This increased the effectiveness of the algorithm theories more roads less traffic intersections construction of roads should take road traffic distribution between roads as well as traffic intersections into account busier streets less traffic less intersections

    ** Artificial Neural Network (ANN) Model**

    To achieve that, we have implemented the Artificial Neural Network (ANN) model. The ANN showed competitive performance in respect to other supervised learning models which can be attributed to its ability to learn even very complex relationships from the dataset. The architecture and hyper parameters of ANN were thoroughly tweaked for the best results in order to demonstrate its flexibility and effectiveness in dealing with complex datasets.

    Model Comparison and Conclusion After comprehensive assessment the Random Forest Regress or model was found to be the most efficient model for forecasting car prices. It’s incorporating both numerical and categorical variables and showing a strong predicting power made it a preferred one. Evaluation metrics and visualizations were given which gave the full picture of the model performance and helped us to arrive at our conclusion that the Random Forests regress or was better.

    k-Means Clustering Algorithm Coming to unsupervised learning, we employed the k-Means clustering algorithm to detect clusters in the car sales dataset. Changing input feature variables space in batches, we determined the number of clusters (k) using evaluation metrics by silhouette score. The variables like engine size, year of manufacture and mileage appeared to be critical in getting the most ideal clusters which emphasized their significance in segmenting the data set. Comparison with Other Clustering Algorithms Lastly, we observed the outcomes of the k-Means clustering technique adding the success of the other clustering techniques, for example, DBSCAN, and hierarchical clustering. Evaluation with metrics of rigorous title of the each method worked we assessed the performance to the dataset effective approach in cluster was identified. Just like k-Means achieved promising results, DBSCAN provided us with a base to be further extended by comparing with other algorithms like DBSCAN and emphasizing that several algorithms should be considered for clustering. Conclusion Finally, our extensive discussion on the sales data for used cars has demonstrated favorable results of supervised as well as unsupervised learning techniques towards understanding the information through regression models and so...

  8. Z

    Data from: Explaining human mobility predictions through a pattern matching...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Smolak, Kamil; Rohm, Witold; Siła-Nowicka, Katarzyna (2021). Explaining human mobility predictions through a pattern matching algorithm [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5788700
    Explore at:
    Dataset updated
    Dec 18, 2021
    Dataset provided by
    University of Auckland
    Wrocław University of Environmental and Life Sciences
    Authors
    Smolak, Kamil; Rohm, Witold; Siła-Nowicka, Katarzyna
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The name of the file indicate information: {type of sequence}_{type of measure}_{sequence properites}_{additional information}.csv

    {type of sequence} - 'synth' for synthetic or 'london' for real mobility data from London, UK. {type of measure} - 'r2' for R-squared measure or 'corr' for Spearman's correlation {sequence properties} - for synthetic data there are three types of sequences, described in the research article (random, markovian, nonstationary). For real mobility data this part includes information about data processing parameters: (...)_london_{type of mobility sequence}_{DBSCAN epsilon value}_{DBSCAN min_pts value}. {type of mobility sequence} is 'seq' for next-place sequences and '30min' or '1H' for the next time-bin sequences and indicate the size of the time-bin. Files with 'predictability' at the end of the file contain R-squared and Spearman's correlation of measures calculated in relation to the predictability measure.

    R2 files include values of R-squared for all types of modelled regression functions. 'line' indicates {y = ax + b} for single variable and {y = ax + by + c} for two variables. 'expo' indicates {y = a*x^b + c} for single variable and {y = a*x^b + c*y^d + e} for two variables 'log' indicates {y = a*log(x*b) + c} for single variable and {y = a * x + c * log(y) + e + d*x * log(y)} for two variables. 'logf' indicates {y = a*log(x) + c * log(y) + e + b*log(x) * log(y)} for two variables

  9. Patients’ age.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerhard Fritsch; Heinz Steltzer; Daniel Oberladstaetter; Carolina Zeller; Hermann Prossinger (2023). Patients’ age. [Dataset]. http://doi.org/10.1371/journal.pone.0280995.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Gerhard Fritsch; Heinz Steltzer; Daniel Oberladstaetter; Carolina Zeller; Hermann Prossinger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Patients’ age.

  10. Analgesics and their distribution in clusters.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerhard Fritsch; Heinz Steltzer; Daniel Oberladstaetter; Carolina Zeller; Hermann Prossinger (2023). Analgesics and their distribution in clusters. [Dataset]. http://doi.org/10.1371/journal.pone.0280995.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Gerhard Fritsch; Heinz Steltzer; Daniel Oberladstaetter; Carolina Zeller; Hermann Prossinger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analgesics and their distribution in clusters.

  11. Data from: Orthopedic procedures.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerhard Fritsch; Heinz Steltzer; Daniel Oberladstaetter; Carolina Zeller; Hermann Prossinger (2023). Orthopedic procedures. [Dataset]. http://doi.org/10.1371/journal.pone.0280995.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Gerhard Fritsch; Heinz Steltzer; Daniel Oberladstaetter; Carolina Zeller; Hermann Prossinger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Orthopedic procedures.

  12. f

    Table 4_Geospatial clustering reveals dengue hotspots across Brazilian...

    • frontiersin.figshare.com
    bin
    Updated Oct 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brena F. Sena; Bobby Brooke Herrera; Danyelly Bruneska Gondim Martins; Jose Luiz Lima Filho (2025). Table 4_Geospatial clustering reveals dengue hotspots across Brazilian municipalities, 2024.docx [Dataset]. http://doi.org/10.3389/fpubh.2025.1620914.s008
    Explore at:
    binAvailable download formats
    Dataset updated
    Oct 27, 2025
    Dataset provided by
    Frontiers
    Authors
    Brena F. Sena; Bobby Brooke Herrera; Danyelly Bruneska Gondim Martins; Jose Luiz Lima Filho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionDengue virus (DENV) remains a major and recurrent public health challenge in Brazil. In 2024, the country experienced its largest recorded epidemic, with more than six million probable cases and substantial pressure on hospital systems. The epidemic’s highly heterogeneous burden highlights the need for municipal-scale geospatial analyses to identify actionable hotspots for targeted interventions.MethodsWe conducted a nationwide clustering analysis using dengue case notifications and hospitalizations from the national SINAN surveillance system, with denominator populations from the Brazilian Institute of Geography and Statistics (IBGE). We calculated standardized case and hospitalization rates per 100,000 population for all municipalities. A multivariate density-based spatial clustering algorithm (DBSCAN) integrated municipality centroids with epidemiologic burden. Parameters (eps, minPts) were selected using k-distance inspection and sensitivity analyses. Temporal stability was assessed through monthly DBSCAN runs using a common parameter set, and climatic associations were evaluated by pairing dengue indicators with CHIRPS precipitation at 0–3 monthly lags.ResultsDBSCAN identified 25 high-burden municipal clusters, with 5,111 municipalities (92.6%) clustered and 408 (7.4%) were classified as noise. Several clusters exhibited average case rates exceeding 20,000 per 100,000 population, particularly in Minas Gerais, Paraná, and Bahia. Some high-incidence municipalities remained geographically isolated and unclustered. Hospitalization-only clustering produced similar geographic patterns. Monthly analyses revealed persistent high-burden clusters, and precipitation was positively associated with incidence at an approximately two-month lag.DiscussionThis study demonstrates that integrating spatial, temporal, and climatic dimensions into a DBSCAN framework provides a reproducible method for delineating dengue hotspots at the municipal scale. By distinguising high-intensity clusters from low-burden areas, the approach offers and operationally relevant tool for guiding vector control and outbreak response during dengue epidemics in Brazil.

  13. f

    Improved DBSCAN clustering algorithm.

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xinhuan Zhang; Les Lauber; Hongjie Liu; Junqing Shi; Jinhong Wu; Yuran Pan (2023). Improved DBSCAN clustering algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0259472.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Xinhuan Zhang; Les Lauber; Hongjie Liu; Junqing Shi; Jinhong Wu; Yuran Pan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Improved DBSCAN clustering algorithm.

  14. Pain level shift and analgesic cocktails.

    • figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerhard Fritsch; Heinz Steltzer; Daniel Oberladstaetter; Carolina Zeller; Hermann Prossinger (2023). Pain level shift and analgesic cocktails. [Dataset]. http://doi.org/10.1371/journal.pone.0280995.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Gerhard Fritsch; Heinz Steltzer; Daniel Oberladstaetter; Carolina Zeller; Hermann Prossinger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pain level shift and analgesic cocktails.

  15. Additional file 2 of A novel protein descriptor for the prediction of drug...

    • springernature.figshare.com
    txt
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingjian Jiang; Zhen Li; Yujie Bian; Zhiqiang Wei (2024). Additional file 2 of A novel protein descriptor for the prediction of drug binding sites [Dataset]. http://doi.org/10.6084/m9.figshare.9877679.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mingjian Jiang; Zhen Li; Yujie Bian; Zhiqiang Wei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Protein list for the experiment with various DBSCAN parameters. This file contains randomly selected training proteins for the experiment with various DBSCAN parameters. All proteins come from the sc-PDB database, 3000 for training, 1000 for validation and 1000 for testing. (CSV 81 kb)

  16. DBSCAN model parameter settings.

    • plos.figshare.com
    xls
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ming Jiang; Dongpeng Peng; Haihan Yu; Shu Chen (2025). DBSCAN model parameter settings. [Dataset]. http://doi.org/10.1371/journal.pone.0319786.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 20, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ming Jiang; Dongpeng Peng; Haihan Yu; Shu Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Economic losses in the car rental industry due to customer breaches remain a critical issue. The rapid growth of the vehicle leasing market has given rise to a pressing concern for enterprises, namely the economic loss, vehicle idleness, and service quality degradation that are often associated with customer default. This study proposes an innovative vehicle rental early warning system that incorporates the improved DBSCAN clustering technique and the iTransformer model. The enhanced DBSCAN technique, which employs a snow ablation optimizer (SAO) algorithm, establishes an electronic barrier and integrates the iTransformer model for trajectory prediction. This enables the real-time monitoring of potential customer defaults and the reduction of economic losses that leasing companies may incur as a result of customer defaults. The system identifies and prevents default risks in a timely manner through a comprehensive analysis of vehicle driving data, thereby safeguarding the interests of corporate entities. The system employs vehicle driving data provided by a Chinese company to accurately identify the vehicle’s resident location and predict future trajectory, effectively preventing customer defaults. The experimental results demonstrate that the model is highly effective in predicting the vehicle’s resident location and future trajectory. The mean square error (MSE), mean absolute error (MAE), and location error reached 0.001, 0.003, and 0.08 kilometers, respectively, which substantiates the model’s efficiency and accuracy. This study has the additional benefit of providing effective warnings to customers of potential default behavior, thereby reducing the economic losses incurred by enterprises. Such an approach not only ensures financial security but also enhances operational efficiency within the industry. Furthermore, it offers robust support for the sustainable development of the car rental industry.

  17. f

    A “Density-Based” Algorithm for Cluster Analysis Using Species Sampling...

    • tandf.figshare.com
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raffaele Argiento; Andrea Cremaschi; Alessandra Guglielmi (2023). A “Density-Based” Algorithm for Cluster Analysis Using Species Sampling Gaussian Mixture Models [Dataset]. http://doi.org/10.6084/m9.figshare.1209700.v3
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Raffaele Argiento; Andrea Cremaschi; Alessandra Guglielmi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We propose a new model for cluster analysis in a Bayesian nonparametric framework. Our model combines two ingredients, species sampling mixture models of Gaussian distributions on one hand, and a deterministic clustering procedure (DBSCAN) on the other. Here, two observations from the underlying species sampling mixture model share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold; this yields a random partition which is coarser than the one induced by the species sampling mixture. Since this procedure depends on the value of the threshold, we suggest a strategy to fix it. In addition, we discuss implementation and applications of the model; comparison with more standard clustering algorithms will be given as well. Supplementary materials for the article are available online.

  18. Data_Sheet_1_Standardizing Single-Frame Phase Singularity Identification...

    • frontiersin.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Li; Tiago P. Almeida; Nawshin Dastagir; María S. Guillem; João Salinet; Gavin S. Chu; Peter J. Stafford; Fernando S. Schlindwein; G. André Ng (2023). Data_Sheet_1_Standardizing Single-Frame Phase Singularity Identification Algorithms and Parameters in Phase Mapping During Human Atrial Fibrillation.docx [Dataset]. http://doi.org/10.3389/fphys.2020.00869.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Xin Li; Tiago P. Almeida; Nawshin Dastagir; María S. Guillem; João Salinet; Gavin S. Chu; Peter J. Stafford; Fernando S. Schlindwein; G. André Ng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PurposeRecent investigations failed to reproduce the positive rotor-guided ablation outcomes shown by initial studies for treating persistent atrial fibrillation (persAF). Phase singularity (PS) is an important feature for AF driver detection, but algorithms for automated PS identification differ. We aim to investigate the performance of four different techniques for automated PS detection.Methods2048-channel virtual electrogram (VEGM) and electrocardiogram signals were collected for 30 s from 10 patients undergoing persAF ablation. QRST-subtraction was performed and VEGMs were processed using sinusoidal wavelet reconstruction. The phase was obtained using Hilbert transform. PSs were detected using four algorithms: (1) 2D image processing based and neighbor-indexing algorithm; (2) 3D neighbor-indexing algorithm; (3) 2D kernel convolutional algorithm estimating topological charge; (4) topological charge estimation on 3D mesh. PS annotations were compared using the structural similarity index (SSIM) and Pearson’s correlation coefficient (CORR). Optimized parameters to improve detection accuracy were found for all four algorithms using Fβ score and 10-fold cross-validation compared with manual annotation. Local clustering with density-based spatial clustering of applications with noise (DBSCAN) was proposed to improve algorithms 3 and 4.ResultsThe PS density maps created by each algorithm with default parameters were poorly correlated. Phase gradient threshold and search radius (or kernels) were shown to affect PS detections. The processing times for the algorithms were significantly different (p < 0.0001). The Fβ scores for algorithms 1, 2, 3, 3 + DBSCAN, 4 and 4 + DBSCAN were 0.547, 0.645, 0.742, 0.828, 0.656, and 0.831. Algorithm 4 + DBSCAN achieved the best classification performance with acceptable processing time (2.0 ± 0.3 s).ConclusionAF driver identification is dependent on the PS detection algorithms and their parameters, which could explain some of the inconsistencies in rotor-guided ablation outcomes in different studies. For 3D triangulated meshes, algorithm 4 + DBSCAN with optimal parameters was the best solution for real-time, automated PS detection due to accuracy and speed. Similarly, algorithm 3 + DBSCAN with optimal parameters is preferred for uniform 2D meshes. Such algorithms – and parameters – should be preferred in future clinical studies for identifying AF drivers and minimizing methodological heterogeneities. This would facilitate comparisons in rotor-guided ablation outcomes in future works.

  19. f

    ChinaExtreDroEventSet (v1.0): Extreme meteorological drought events over...

    • plus.figshare.com
    docx
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenchen Liu; wen zhou (2024). ChinaExtreDroEventSet (v1.0): Extreme meteorological drought events over China (1951-2022) [Dataset]. http://doi.org/10.25452/figshare.plus.25512334.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    Figshare+
    Authors
    Zhenchen Liu; wen zhou
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Dataset and Event list prefixed with ChinaExtreDroEventSet(v1.0)_ are the supplement files of the manuscript entitled “Extreme Meteorological Droughts over China (1951—2022): event detection, migration pattern, and diversity of temperature extremes”. The manuscript has been submitted to Advanced in Atmospheric Sciences (AAS) for the second-round review.(1) DatasetChinaExtreDroEventSet(v1.0)_01_Dataset_AAS_LiuZhou2024_20240330.zip contains data files of Extreme Meteorological Droughts over China (1951—2022).The first-level file name (e.g., Dro-06_P0_m1p0_40pts) consists of a drought event order (e.g., Dro-06), patch code (e.g., P0), parameter configuration for event detection (e.g., m1p0_40pts). Regarding patch code, P0 means the unique patch representing the drought event, while Pi (i=1,2,.., N) are separated patches belonged to a complete drought event. Regarding parameter configuration for event detection, (e.g., m1p0_40pts). The string m1p0 represents that the input 3D discrete gridded dataset for the DBSCAN algorithm, with SPAI less than −1.0. The string 40pts, one significant parameter (i.e., min_samples) of the DBSCAN algorithm (Ester et al., 1996), is the number of sample points within a given search distance. Details are provided in Liu et al. (2023, AOSL).Regarding the specific files of each drought event, the formats and meanings are identical to those in the Glo3DHydroClimEventSet(v1.0) database (Liu and Zhou, 2023).(2) Event ListThe ChinaExtreDroEventSet(v1.0)_02_EventList_AAS_LiuZhou2024_20240330.docx list metrics and ranks of all drought events, as part of the manuscript.

  20. Additional file 1 of DBCSMOTE: a clustering-based oversampling technique for...

    • springernature.figshare.com
    zip
    Updated Feb 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yanyun Tao; Yuzhen Zhang; Bin Jiang (2024). Additional file 1 of DBCSMOTE: a clustering-based oversampling technique for data-imbalanced warfarin dose prediction [Dataset]. http://doi.org/10.6084/m9.figshare.13126607.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 6, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Yanyun Tao; Yuzhen Zhang; Bin Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1. DBCSMOTE.zip, code files for generating minority and majority clusters in Matlab. DBCSMOTE_demo.m: the demo of DBCSMOTE together with random forest, which gives the estimated dosage. ‘num’ indicates the number of iterations of running DBCSMOTE. In each iteration, ‘evaluatePop’ calls the function to evaluate the oversampling quality. ‘train.txt’, ‘validate.txt’ and ‘test.txt’ are sub sets used for training, validation and testing. DBSCAN_fun.m: the function of algorithm DBSCAN. It conducts the clustering with two parameters (Eps and MinPts) on an input dataset and returns the samples of minority clusters and the number of clusters. RandomForest.m: the function of random forest. Random forest is an ensemble model of CARTs, which are the weak regression models. They are built on the extended training set, which is extended by DBCSMOTE. CARTprediction.m: the function of CART algorithm. This is a weak regression model of random forest. Meanwhile, this is the tool for evaluating the oversampling quality, which is generated by DBCSMOTE.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana (2024). Overall comparison of proposed enhanced DBSCAN with other variants of DBSCAN. [Dataset]. http://doi.org/10.1371/journal.pone.0313890.t004
Organization logo

Overall comparison of proposed enhanced DBSCAN with other variants of DBSCAN.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Dec 19, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Shahneela Pitafi; Toni Anwar; I Dewa Made Widia; Zubair Sharif; Boonsit Yimwadsana
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Overall comparison of proposed enhanced DBSCAN with other variants of DBSCAN.

Search
Clear search
Close search
Google apps
Main menu