8 datasets found
  1. f

    Data from: Comparing the effects of Euclidean distance matching and dynamic...

    • figshare.com
    • tandf.figshare.com
    html
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars De Sloover; Haosheng Huang; Mauk Hillewaert; Jana Verdoodt; Nico Van de Weghe (2025). Comparing the effects of Euclidean distance matching and dynamic time warping in the clustering of COVID-19 evolution: a spatiotemporal analysis of COVID-19 dynamics across Europe [Dataset]. http://doi.org/10.6084/m9.figshare.30294498.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Oct 7, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Lars De Sloover; Haosheng Huang; Mauk Hillewaert; Jana Verdoodt; Nico Van de Weghe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A comprehensive understanding of COVID-19’s spatiotemporal progression is crucial for devising effective containment strategies, yet existing mapping approaches often fail to capture the pandemic’s multifaceted evolution. This study addresses that gap by focusing on how well different time-series clustering methods handle the complexities of misalignment in outbreak trajectories. Specifically, we investigate the spatiotemporal dynamics of COVID-19 cases across NUTS 2 regions in Europe during the second pandemic wave in the winter of 2020–2021. We employ time series clustering using Euclidean Distance Matching (EDM) and Dynamic Time Warping (DTW) to identify distinct patterns of pandemic progression. The hierarchical clustering results, visualized through heatmap dendrograms, chorochromatic cluster maps, and mean time series plots, reveal heterogeneous spatiotemporal patterns at different levels of clustering granularity. DTW outperforms EDM in capturing the temporal dynamics, yielding better-defined clusters in terms of temporal similarity. While DTW generally shows higher spatial contiguity values, EDM maintains statistically significant spatial coherence across clusters, especially at higher numbers of clusters. We discuss the trade-offs between optimizing for temporal similarity and spatial contiguity, and their implications for understanding the spatiotemporal dynamics of COVID-19. The findings highlight the importance of considering both temporal and spatial aspects when analyzing the spread of infectious diseases.

  2. Homestays data

    • kaggle.com
    zip
    Updated May 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyanshu shukla (2024). Homestays data [Dataset]. https://www.kaggle.com/datasets/priyanshu594/homestays-data
    Explore at:
    zip(44330689 bytes)Available download formats
    Dataset updated
    May 25, 2024
    Authors
    Priyanshu shukla
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Objective: Build a robust predictive model to estimate the log_price of homestay listings based on comprehensive analysis of their characteristics, amenities, and host information. First make sure that the entire dataset is clean and ready to be used. 1. Feature Engineering: Task: Enhance the dataset by creating actionable and insightful features. Calculate Host_Tenure by determining the number of years from host_since to the current date, providing a measure of host experience. Generate Amenities_Count by counting the items listed in the amenities array to quantify property offerings. Determine Days_Since_Last_Review by calculating the days between last_review and today to assess listing activity and relevance. 2. Exploratory Data Analysis (EDA): Task: Conduct a deep dive into the dataset to uncover underlying patterns and relationships. Analyze how pricing (log_price) correlates with both categorical (such as room_type and property_type) and numerical features (like accommodates and number_of_reviews). Utilize statistical tools and visualizations such as correlation matrices, histograms for distribution analysis, and scatter plots to explore relationships between variables. 3. Geospatial Analysis: Task: Investigate the geographical data to understand regional pricing trends. Plot listings on a map using latitude and longitude data to visually assess price distribution. Examine if certain neighbourhoods or proximity to city centres influence pricing, providing a spatial perspective to the pricing strategy. 4. Sentiment Analysis on Textual Data: Task: Apply advanced natural language processing techniques to the description texts to extract sentiment scores. Use sentiment analysis tools to determine whether positive or negative descriptions influence listing prices, incorporating these findings into the predictive model being trained as a feature. 5. Amenities Analysis: Task: Thoroughly parse and analyse the amenities provided in the listings. Identify which amenities are most associated with higher or lower prices by applying statistical tests to determine correlations, thereby informing both pricing strategy and model inputs. 6. Categorical Data Encoding: Task: Convert categorical data into a format suitable for machine learning analysis. Apply one-hot encoding to variables like room_type, city, and property_type, ensuring that the model can interpret these as distinct features without any ordinal implication. 7. Model Development and Training: Task: Design and train predictive models to estimate log_price. Begin with a simple linear regression to establish a baseline, then explore more complex models such as RandomForest and GradientBoosting to better capture non-linear relationships and interactions between features. Document (briefly within Jupyter notebook itself) the model-building process, specifying the choice of algorithms and rationale. 8. Model Optimization and Validation: Task: Systematically optimize the models to achieve the best performance. Employ techniques like grid search to experiment with different hyperparameters settings. Validate model choices through techniques like k-fold cross-validation, ensuring the model generalizes well to unseen data. 9. Feature Importance and Model Insights: Task: Analyze the trained models to identify which features most significantly impact log_price. Utilize model-specific methods like feature importance scores for tree-based models and SHAP values for an in depth understanding of feature contributions. 10. Predictive Performance Assessment: Task: Critically evaluate the performance of the final model on a reserved test set. Use metrics such as Root Mean Squared Error (RMSE) and R-squared to assess accuracy and goodness of fit. Provide a detailed analysis of the residuals to check for any patterns that might suggest model biases or misfit.

  3. D

    Lunar Resource Mapping Data Services Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Lunar Resource Mapping Data Services Market Research Report 2033 [Dataset]. https://dataintelo.com/report/lunar-resource-mapping-data-services-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Lunar Resource Mapping Data Services Market Outlook



    According to our latest research, the lunar resource mapping data services market size reached USD 1.14 billion globally in 2024, with robust momentum driven by increasing investments in lunar exploration and resource utilization. The market is expected to grow at a CAGR of 17.8% from 2025 to 2033, reaching a forecasted value of USD 5.98 billion by 2033. This expansion is fueled by the convergence of advanced satellite imaging, data analytics, and an intensifying focus on lunar resource extraction and scientific research.




    One of the primary growth factors propelling the lunar resource mapping data services market is the accelerating pace of international lunar missions. Governments and private enterprises alike are racing to establish a presence on the Moon, seeking to identify and exploit valuable resources such as water ice, rare earth elements, and other minerals. The Artemis program by NASA, China’s Chang’e missions, and India’s Chandrayaan initiatives are just a few examples of the global commitment to lunar exploration. These missions rely heavily on detailed, high-resolution mapping and geospatial data to identify landing sites, assess resource potential, and plan safe and efficient mission trajectories. As a result, demand for advanced lunar data services is rising sharply, driving innovation and investment across the sector.




    Technological advancements are another significant driver for the lunar resource mapping data services market. The integration of artificial intelligence, machine learning, and big data analytics into satellite imaging and remote sensing technologies has revolutionized the accuracy, speed, and depth of lunar data analysis. Modern orbital satellites, landers, and rovers are now equipped with high-definition sensors capable of capturing multispectral and hyperspectral images, enabling precise identification of surface and sub-surface resources. These technological leaps have not only improved the quality of lunar mapping but have also expanded the range of applications, from mining exploration and infrastructure development to scientific research and space mission planning.




    The growing commercialization of space and the emergence of new space economies are further catalyzing the expansion of the lunar resource mapping data services market. As private companies such as SpaceX, Blue Origin, and ispace join established space agencies in lunar ventures, the need for accurate, real-time geospatial data is becoming critical for commercial decision-making and risk mitigation. This shift is fostering the development of a robust ecosystem of data providers, analytics firms, and visualization platforms, all competing to deliver value-added services to a diverse clientele that includes government agencies, commercial enterprises, and research institutions. The resulting market dynamism is expected to continue driving growth and innovation through the next decade.




    Regionally, North America currently dominates the lunar resource mapping data services market, accounting for the largest share in 2024 due to significant government investments, a mature space industry, and the presence of leading technology providers. However, Asia Pacific is rapidly emerging as a key growth region, propelled by ambitious lunar programs in China, India, and Japan, as well as increasing collaboration between public and private sectors. Europe also maintains a strong position, supported by the European Space Agency’s initiatives and a growing number of commercial space ventures. Meanwhile, Latin America and the Middle East & Africa are gradually entering the market through partnerships and technology transfers, contributing to a more diversified global landscape.



    Service Type Analysis



    The service type segment of the lunar resource mapping data services market encompasses satellite imaging, remote sensing, geospatial data analysis, data visualization, and other specialized services. Satellite imaging remains the cornerstone of lunar mapping, offering high-resolution visual and spectral data essential for identifying surface features, resource deposits, and potential landing sites. The evolution of satellite imaging technologies, such as synthetic aperture radar and multispectral sensors, has significantly improved the accuracy and granularity of lunar surface maps. Companies specializing in satellite imaging are no

  4. f

    Data from: Establishing the relationship between urban land-cover...

    • datasetcatalog.nlm.nih.gov
    • tandf.figshare.com
    Updated Jan 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li, Xiaoxiao; Galletti, Christopher S.; Connors, John Patrick (2021). Establishing the relationship between urban land-cover configuration and night time land-surface temperature using spatial regression [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000777793
    Explore at:
    Dataset updated
    Jan 13, 2021
    Authors
    Li, Xiaoxiao; Galletti, Christopher S.; Connors, John Patrick
    Description

    Studies suggest that urban form can influence microclimate regulation. Remote sensing studies have contributed to these findings through analysis of high-resolution land cover maps, landscape ecology metrics, and thermal imagery. Collectively, these have been referred to as land cover configuration studies. There are three objectives to this study. The first is to assess the relationship between nighttime land surface temperatures (LST) and land cover configuration and composition. The second objective is to outline a comprehensive methodology that includes ordinary least squares (OLS), spatial regression, variable selection, and multicollinearity analysis. Our last objective is to test three hypotheses about the relationship between LST and land cover, which can briefly be described as: 1) the importance of land-use regimes in modeling LST from land cover composition and configuration variables; 2) the strength of the correlation between LST and roads, buildings, and vegetation; and 3) the improved quality of models using landscape metrics in modeling the relationship between LST and land cover. Based on 16 different models (8 OLS, 8 spatial regression) we could confirm the above hypotheses, but we found that the configuration of buildings, roads, and vegetation have a complex relationship with LST. Our interpretation of this complexity, combined with the strength of composition variables, is that parsimonious models, for now, are more useful to urban planners because they are more generalizable. Finally, spatial regression models of land cover configuration and LST demonstrated an improvement over non-spatial linear models (OLS). Spatial regression models reduced heteroskedasticity and clusters of residuals, and tempered coefficients, suggesting that the OLS models could be biased. OLS models were still found to be a valuable tool for exploratory analysis.

  5. r

    Spatial analysis data for 'Lines in the sand: quantifying the cumulative...

    • researchdata.edu.au
    Updated 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raiter, K.G. (2017). Spatial analysis data for 'Lines in the sand: quantifying the cumulative development footprint in the world's largest remaining temperate woodland' [Dataset]. http://doi.org/10.4227/05/59893d248decc
    Explore at:
    Dataset updated
    2017
    Dataset provided by
    Advanced Ecological Knowledge and Observation System
    ÆKOS Data Portal, rights owned by University of Western Australia
    Authors
    Raiter, K.G.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 2, 2012 - Jul 30, 2014
    Area covered
    Description

    These datasets provide the data underlying the publication (see abstract below). The datasets are:

    data in csv format: 1. development footprint by sample area:

    Information on the 24 ~490 km2 sample areas assessed in the study, including the different infrastructure types (roads, railways, mapped tracks, unmapped tracks which have been manually digitised in the study using aerial imagery and hub infrastructure such as mine pits and waste rock dumps, also manually digitised in the study). Also contains some key covariables assessed as potential explanatory variables for development footprint.

    The region-wide modelling of development footprint found strong positive effects of mining project density and pastoralism, as well as a highly significant negative interaction between the two. At low mining project densities, development footprints are more extensive in pastoral areas, but at high mining project densities, pastoral areas are relatively less developed than non-pastoral areas, on average.

    1. gww 20 km grid:

    This dataset provides data for the 20x20 km grid placed over the whole Great Western Woodlands and used for the regional estimation of development footprint, linear infrastructure density, and linear infrastructure type based on the region-wide analysis. Data is for each cell in the grid and provides the total length of roads in that grid cell, MINEDEX mining projects, pastoral status, etc.

    This dataset was used to project the data from the 24 study areas across the whole of the Great Western Woodlands and calculate region-wide estimates of development footprint and linear infrastructure lengths.

    1. disturbance by patch:

    This dataset provides the data for each patch for the analysis of patch-level drivers of development footprint, which was performed to gain further insights into the effects of other landscape variables that what could be gleaned from the region-wide analysis. For this analysis, we divided sample areas into polygonal ‘patch types’, each with a unique combination of the following categorical covariables: pastoral tenure, greenstone lithology, conservation tenure, ironstone formation, schedule 1 area clearing restrictions, environmentally sensitive area designation, vegetation formation, and sample area.

    For each patch type (n=261), we calculated the following attributes: number of mining projects, number of dead mineral tenements, sum of duration of all live and dead tenements, type of tenements (exploration/prospecting tenement, mining and related activities tenement, none), primary target commodity (gold, nickel, iron-ore, other), distance to wheatbelt, and distance to nearest town.

    1. mapped versus digitised tracks:

    This dataset provides mapped and unmapped track widths, measured using high-resolution aerial imagery at at least 20 randomly-generated locations within each of 24 sample areas. Pastoral tenure and mining intensity for each sample area are included for analysis purposes.

    This data was analysed as follows: we used a t-test to test for a difference between mapped and unmapped track width, conducted data exploration as per (Zuur et al. 2009), and modelled track widths using linear mixed models with ‘lme4’ package in R. We created a global model containing the following fixed variables: mapped/unmapped status; mining activity level for the relevant sample area, and pastoral status. Sample area identity was included as the random effect in all models after testing for its significance.

    We used the ‘dredge’ function in ‘MuMin’ package to model all possible subsets of the global model and rank them based on AICc values. The optimal model included only mapped/unmapped status as a fixed effect, and the other top-ranking model also included a positive effect of pastoral tenure on track width. Mapped tracks were found to be on average ~1 m wider than unmapped tracks (p < 0.001) (Figure A2.1). Average widths of mapped and unmapped tracks were 6.06 m (s.e. 0.15 m) and 4.92 m (s.e. 0.10 m) respectively. No effect of mining activity was included in the top-ranking models.

    1. edge effect scenarios:

    Hypothetical edge effect zones were created, based on effect zones gleaned from the literature and arranged under three scenarios, to reflect potential risks of offsite impacts in areas adjacent to development footprints observed (see appendix 3 of article).

    The calculated proportion of the entire GWW within edge effect zones varied from ~3% under the conservative scenario to ~35% under the maximal scenario. Within the range of development footprints observed in this study, the proportion of a landscape that lies within edge effect zones increases hyperbolically with the number of mining projects, and approaches 100% in the maximal scenario, 60% in the moderate scenario, and ~20% under the conservative scenario.

    shapefiles:

    1. Great Western Woodlands boundary

    2. sample areas (layer file shows sample areas by category). We used stratified random sampling to distribute 24 circular sample areas, each 25 km in diameter, among the 8 mining and pastoral categories. We used circular sample areas to minimise the edge-to-area ratio of the sample areas and therefore maximise the extent to which the sample areas reflected the category represented rather than the adjacent landscape.

    3. linear infrastructure extending beyond gww boundary by ~100 km. This is a dataset compiled from 23 different sources that represents the most comprehensive spatial dataset for the GWW available at the time of publication, to KR’s knowledge. However, it does contain a number of different sources of error and should not be considered to necessarily reflect an updates, accurate dataset (note there is a more detailed metadata document inside this folder).

    4. linear infrastructure footprints. Linear features buffered by average width of that linear infrastructure type for each sample area. Linear features include paved roads and railways, unpaved roads, mapped tracks, and unmapped tracks (digitized from aerial images in this study).

    5. digitised tracks All linear infrastructure that hadn’t already been mapped in #8 above. Manually digitised from high-resolution aerial images in this study.

    6. digitised hub infrastructure Development footprints of all non-linear (i.e. polygonal) anthropogenic disturbance, including mine pits, waste rock dumps, mining camps and accommodation villages, dams, and other cleared areas, manually digitised from high-resolution aerial imagery in this study.

    7. edge effect zones Polygons created by creating buffers around the development footprint as described in Appendix 3 of the article. These zones around the direct development footprint represent offsite impact risk for each type of infrastructure, using a hypothesized set of risk buffers. These were based on edge effect distances reported in the literature for species and processes from around the world. Three scenarios are represented: a conservative, moderate, and maximal scenario.

    The abstract for the publication is as follows:

    Context The acceleration of infrastructure development presents many challenges for the mitigation of ecological impacts. The type, extent, and cumulative effects of multiple developments must be quantified to enable mitigation.

    Objectives We quantified anthropogenic development footprints in a globally significant and relatively intact region. We identified the proportion accounted for by linear infrastructure (e.g. roads) including infrastructure that is currently unmapped; investigated the importance of key landscape drivers; and explored potential ramifications of offsite impacts (edge effects).

    Methods We quantified direct development footprints of linear and 'hub' infrastructure in the Great Western Woodlands (GWW) in south-western Australia, using digitisation and extrapolation from a stratified random sample of aerial imagery. We used spatial datasets and literature resources to identify predictors of development footprint extent and calculate hypothetical ‘edge effect zones’.

    Results Unmapped linear infrastructure, only detectable through manual digitisation, accounts for the greatest proportion of the direct development footprint. Across the 160,000 km2 GWW, the estimated development footprint is 690 km2, of which 67% consists of linear infrastructure and the remainder is ‘hub’ infrastructure. An estimated 150,000 km of linear infrastructure exists in the study area, equating to an average of ~1 km per km2. Beyond the direct footprint, a further 4,000–55,000 km2 (3–35% of the region) lies within edge effect zones.

    Conclusions This study highlights the pervasiveness of linear infrastructure and hence the importance of managing its cumulative impacts as a key component of landscape conservation. Our methodology can be applied to other relatively intact landscapes worldwide.

  6. f

    Data Sheet 1_Neonatal Mortality in Burkina Faso: An Exploratory Analysis of...

    • frontiersin.figshare.com
    pdf
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hervé Bassinga (2025). Data Sheet 1_Neonatal Mortality in Burkina Faso: An Exploratory Analysis of Determinants and Geospatial Inequalities.pdf [Dataset]. http://doi.org/10.3389/ijph.2025.1608901.s002
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 24, 2025
    Dataset provided by
    Frontiers
    Authors
    Hervé Bassinga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Burkina Faso
    Description

    ObjectivesThis article analyzes the factors associated with neonatal mortality in Burkina Faso, as well as the communal inequalities in this mortality.MethodsThe analysis is based on data from the 2021 Demographic and Health Survey (DHS). It draws on a representative sample of 7,225 children. The determinants of mortality were examined using a log-binomial regression model. For the analysis of geospatial inequalities, the Richardson method was applied to classify communes according to their probability of achieving SDG target 3.2.2 by 2030, distinguishing areas with low, medium, and high likelihoods of attainment.ResultsThe analysis reveals an excess risk of neonatal mortality linked to male sex, multiple births, short birth intervals, low maternal education, and limited access to health services. According to the Richardson classification, all communes are on track to meet SDG target 3.2.2 (12‰ by 2030). However, 37 communes show higher residual risks requiring close monitoring.ConclusionThese results underline the importance of implementing multi-sectoral interventions adapted to territorial specificities in order to effectively maintain the reduction of neonatal mortality in Burkina Faso.

  7. UNESCO World Heritage Sites 2019

    • kaggle.com
    zip
    Updated Aug 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ujwal Kandi (2020). UNESCO World Heritage Sites 2019 [Dataset]. https://www.kaggle.com/datasets/ujwalkandi/unesco-world-heritage-sites/code
    Explore at:
    zip(806132 bytes)Available download formats
    Dataset updated
    Aug 3, 2020
    Authors
    Ujwal Kandi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    World
    Description

    Context

    A UNESCO World Heritage Site is a site that has been nominated for the United Nations Educational, Scientific and Cultural Organization's International World Heritage program. The program aims to catalogue and preserve sites of outstanding importance, either cultural or natural, to the common heritage of humankind.

    https://cdn.pixabay.com/photo/2016/08/03/13/33/stone-henge-1566686_1280.jpg%20=450x450" alt="Stonehenge">

    A World Heritage Site is a landmark or area with legal protection by an international convention guarded by the United Nations Educational, Scientific and Cultural Organization (UNESCO). World Heritage Sites are designated by UNESCO for having cultural, historical, scientific or other forms of significance. The sites are judged to contain "cultural and natural heritage around the world considered to be of outstanding value to humanity." To be selected, a World Heritage Site must be a somehow unique landmark which is geographically and historically identifiable and has special cultural or physical significance. For example, World Heritage Sites might be ancient ruins or historical structures, buildings, cities, deserts, forests, islands, lakes, monuments, mountains, or wilderness areas. As of June 2020, a total of 1,121 World Heritage Sites (869 cultural, 213 natural, and 39 mixed properties) exist across 167 countries; the three countries with most sites are China, Italy (both 55) and Spain (48).

    Content

    This dataset contains spatial data of 1121 World Heritage Sites that were listed into the World Heritage List by UNESCO.

    Acknowledgements

  8. Oxygen Exposure for Benthic Megafauna near San

    • kaggle.com
    zip
    Updated Feb 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Oxygen Exposure for Benthic Megafauna near San [Dataset]. https://www.kaggle.com/datasets/thedevastator/oxygen-exposure-for-benthic-megafauna-near-san-d/code
    Explore at:
    zip(89559 bytes)Available download formats
    Dataset updated
    Feb 16, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Oxygen Exposure for Benthic Megafauna near San Diego

    Spatially Varying Environmental Risk

    By [source]

    About this dataset

    This dataset captures an in-depth look into the environmental conditions of the underwater world off Southern California's coast. It provides invaluable information related to spatial risk variation, such as oxygen exposure levels, depths and habitat criteria of 53 species of benthic and epibenthic megafauna recorded during the three-year study. This data will provide insight into aquatic life dynamics and potentially generate improved management strategies for protecting these vital species. Moreover, due to the importance that waters play within our planet's fragile ecosystem, a proper understanding of their affairs could lead to greater marine sustainability in the long-term. Ultimately, this dataset may help answer our questions about how exactly ocean life is responding to intense human activity and its effects on today's seaside communities

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    • Download and install the dataset: The dataset contains two .csv files, each containing data from the three-year study on oxygen exposure for benthic and epibenthic megafauna off the coast of San Diego in Southern California. Download these two files to your computer and save them for further analysis.

    • Familiarize yourself with the datasets: Each file includes very detailed information about a particular variable related to the study (for example, SpeciesMetadata contains species-level information on 53 species of benthic and epibenthic megafauna). Read through each data sheet carefully in order to gain a better understanding of what's included in each column.

    • Clean up any outliers or missing values: Once you understand which columns are important for your analysis, you can begin cleaning up any outliers or missing values that may be present in your dataset. This is an important step as it will help ensure that further analysis is performed accurately.

    • Choose an appropriate visualization method: Depending on what type of results you want to show from your analysis, choose an appropriate visualization method (e.g., bar plot, scatterplot). Also consider if adding labeling such as color with respect to categories would improve legibility of figures you produce from this dataset during exploratory data analyses stages.

      5) Choose a statistical test suitable for this type of project: Once allyour visuals have been produced its time to interpret results using statistics tests depending on how many categorical variables are presentin the data set (i.e t-test or ANOVA). As well understand key outputs like p_values so experiment could effectively conclude if thereare significant differences between treatmentswhen comparing distributions among samples/populations being studied here.. Be sureto adjust mean size/sample size when performing statistic testsuitably accordingto determining adequate power when selecting applicable tests etc.

    Research Ideas

    • Comparing the effects of different environmental factors (depth, temperature, salinity etc.) on depth-specific distributions of oxygen and benthic megafauna.
    • Identifying and mapping vulnerable areas for benthic species based on environmental factors and oxygen exposure patterns.
    • Developing models to predict underlying spatial risk variables for endangered species to inform conservation efforts in the study area

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: ROVObservationData.csv

    File: SpeciesMetadata.csv

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lars De Sloover; Haosheng Huang; Mauk Hillewaert; Jana Verdoodt; Nico Van de Weghe (2025). Comparing the effects of Euclidean distance matching and dynamic time warping in the clustering of COVID-19 evolution: a spatiotemporal analysis of COVID-19 dynamics across Europe [Dataset]. http://doi.org/10.6084/m9.figshare.30294498.v1

Data from: Comparing the effects of Euclidean distance matching and dynamic time warping in the clustering of COVID-19 evolution: a spatiotemporal analysis of COVID-19 dynamics across Europe

Related Article
Explore at:
htmlAvailable download formats
Dataset updated
Oct 7, 2025
Dataset provided by
Taylor & Francis
Authors
Lars De Sloover; Haosheng Huang; Mauk Hillewaert; Jana Verdoodt; Nico Van de Weghe
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A comprehensive understanding of COVID-19’s spatiotemporal progression is crucial for devising effective containment strategies, yet existing mapping approaches often fail to capture the pandemic’s multifaceted evolution. This study addresses that gap by focusing on how well different time-series clustering methods handle the complexities of misalignment in outbreak trajectories. Specifically, we investigate the spatiotemporal dynamics of COVID-19 cases across NUTS 2 regions in Europe during the second pandemic wave in the winter of 2020–2021. We employ time series clustering using Euclidean Distance Matching (EDM) and Dynamic Time Warping (DTW) to identify distinct patterns of pandemic progression. The hierarchical clustering results, visualized through heatmap dendrograms, chorochromatic cluster maps, and mean time series plots, reveal heterogeneous spatiotemporal patterns at different levels of clustering granularity. DTW outperforms EDM in capturing the temporal dynamics, yielding better-defined clusters in terms of temporal similarity. While DTW generally shows higher spatial contiguity values, EDM maintains statistically significant spatial coherence across clusters, especially at higher numbers of clusters. We discuss the trade-offs between optimizing for temporal similarity and spatial contiguity, and their implications for understanding the spatiotemporal dynamics of COVID-19. The findings highlight the importance of considering both temporal and spatial aspects when analyzing the spread of infectious diseases.

Search
Clear search
Close search
Google apps
Main menu