43 datasets found
  1. Identifying the appropriate spatial resolution for the analysis of crime...

    • plos.figshare.com
    zip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Malleson; Wouter Steenbeek; Martin A. Andresen (2023). Identifying the appropriate spatial resolution for the analysis of crime patterns [Dataset]. http://doi.org/10.1371/journal.pone.0218324
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nick Malleson; Wouter Steenbeek; Martin A. Andresen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundA key issue in the analysis of many spatial processes is the choice of an appropriate scale for the analysis. Smaller geographical units are generally preferable for the study of human phenomena because they are less likely to cause heterogeneous groups to be conflated. However, it can be harder to obtain data for small units and small-number problems can frustrate quantitative analysis. This research presents a new approach that can be used to estimate the most appropriate scale at which to aggregate point data to areas.Data and methodsThe proposed method works by creating a number of regular grids with iteratively smaller cell sizes (increasing grid resolution) and estimating the similarity between two realisations of the point pattern at each resolution. The method is applied first to simulated point patterns and then to real publicly available crime data from the city of Vancouver, Canada. The crime types tested are residential burglary, commercial burglary, theft from vehicle and theft of bike.FindingsThe results provide evidence for the size of spatial unit that is the most appropriate for the different types of crime studied. Importantly, the results are dependent on both the number of events in the data and the degree of spatial clustering, so a single ‘appropriate’ scale is not identified. The method is nevertheless useful as a means of better estimating what spatial scale might be appropriate for a particular piece of analysis.

  2. Unified feature association networks through integration of transcriptomic...

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan S. McClure; Jason P. Wendler; Joshua N. Adkins; Jesica Swanstrom; Ralph Baric; Brooke L. Deatherage Kaiser; Kristie L. Oxford; Katrina M. Waters; Jason E. McDermott (2023). Unified feature association networks through integration of transcriptomic and proteomic data [Dataset]. http://doi.org/10.1371/journal.pcbi.1007241
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ryan S. McClure; Jason P. Wendler; Joshua N. Adkins; Jesica Swanstrom; Ralph Baric; Brooke L. Deatherage Kaiser; Kristie L. Oxford; Katrina M. Waters; Jason E. McDermott
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    High-throughput multi-omics studies and corresponding network analyses of multi-omic data have rapidly expanded their impact over the last 10 years. As biological features of different types (e.g. transcripts, proteins, metabolites) interact within cellular systems, the greatest amount of knowledge can be gained from networks that incorporate multiple types of -omic data. However, biological and technical sources of variation diminish the ability to detect cross-type associations, yielding networks dominated by communities comprised of nodes of the same type. We describe here network building methods that can maximize edges between nodes of different data types leading to integrated networks, networks that have a large number of edges that link nodes of different–omic types (transcripts, proteins, lipids etc). We systematically rank several network inference methods and demonstrate that, in many cases, using a random forest method, GENIE3, produces the most integrated networks. This increase in integration does not come at the cost of accuracy as GENIE3 produces networks of approximately the same quality as the other network inference methods tested here. Using GENIE3, we also infer networks representing antibody-mediated Dengue virus cell invasion and receptor-mediated Dengue virus invasion. A number of functional pathways showed centrality differences between the two networks including genes responding to both GM-CSF and IL-4, which had a higher centrality value in an antibody-mediated vs. receptor-mediated Dengue network. Because a biological system involves the interplay of many different types of molecules, incorporating multiple data types into networks will improve their use as models of biological systems. The methods explored here are some of the first to specifically highlight and address the challenges associated with how such multi-omic networks can be assembled and how the greatest number of interactions can be inferred from different data types. The resulting networks can lead to the discovery of new host response patterns and interactions during viral infection, generate new hypotheses of pathogenic mechanisms and confirm mechanisms of disease.

  3. Insurance Dataset for Data Engineering Practice

    • kaggle.com
    zip
    Updated Sep 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KPOVIESI Olaolouwa Amiche Stéphane (2025). Insurance Dataset for Data Engineering Practice [Dataset]. https://www.kaggle.com/datasets/kpoviesistphane/insurance-dataset-for-data-engineering-practice
    Explore at:
    zip(475362 bytes)Available download formats
    Dataset updated
    Sep 24, 2025
    Authors
    KPOVIESI Olaolouwa Amiche Stéphane
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Insurance Dataset for Data Engineering Practice

    Overview

    A realistic synthetic French insurance dataset specifically designed for practicing data cleaning, transformation, and analytics with PySpark and other big data tools. This dataset contains intentional data quality issues commonly found in real-world insurance data.

    Dataset Contents

    📊 Three Main Tables:

    • contracts.csv (~15,000 rows) - Insurance contracts with client information
    • claims.csv (~6,000 rows) - Insurance claims with damage and settlement details
    • vehicles.csv (~12,000 rows) - Vehicle information for auto insurance contracts

    🗺️ Geographic Coverage:

    • French cities with realistic postal codes
    • Risk zone classifications (High/Medium/Low)
    • Regional pricing coefficients

    🏷️ Product Types:

    • Auto Insurance (majority)
    • Home Insurance
    • Life Insurance
    • Health Insurance

    🎯 Intentional Data Quality Issues

    Perfect for practicing data cleaning and transformation:

    Date Format Issues:

    • Mixed formats: 2024-01-15, 15/01/2024, 01/15/2024
    • String storage requiring parsing and standardization

    Price Format Inconsistencies:

    • Multiple currency formats: 1250.50€, €1250.50, 1250.50 EUR, $1375.55
    • Missing currency symbols: 1250.50
    • Written formats: 1250.50 euros

    Missing Data Patterns:

    • Strategic missingness in age (8%), CSP (12%), expert_id (20-25%)
    • Realistic patterns based on business logic

    Categorical Inconsistencies:

    • Gender: M, F, Male, Female, empty strings
    • Power units: 150 HP, 150hp, 150 CV, 111 kW, missing values

    Data Type Issues:

    • Numeric values stored as strings
    • Mixed data types requiring casting

    🚀 Perfect for Practicing:

    PySpark Operations:

    • to_date() and date parsing functions
    • regexp_replace() for price cleaning
    • when().otherwise() conditional logic
    • cast() for data type conversions
    • fillna() and dropna() strategies

    Data Engineering Tasks:

    • ETL pipeline development
    • Data validation and quality checks
    • Join operations across related tables
    • Aggregation with business logic
    • Data standardization workflows

    Analytics & ML:

    • Customer segmentation
    • Claim frequency analysis
    • Premium pricing models
    • Risk assessment by geography
    • Churn prediction

    🏢 Business Context

    Realistic insurance business rules implemented: - Age-based premium adjustments - Geographic risk zone pricing - Product-specific claim patterns - Seasonal claim distributions - Client lifecycle status transitions

    💡 Use Cases:

    • Data Engineering Bootcamps: Hands-on PySpark practice
    • SQL Training: Complex joins and aggregations
    • Data Science Projects: End-to-end ML pipeline development
    • Business Intelligence: Dashboard and reporting practice
    • Data Quality Workshops: Cleaning and validation techniques

    🔧 Tools Compatibility:

    • Apache Spark / PySpark
    • Pandas / Python
    • SQL databases
    • Databricks
    • Google Cloud Dataflow
    • AWS Glue

    📈 Difficulty Level:

    Intermediate - Suitable for learners with basic Python/SQL knowledge ready to tackle real-world data challenges.

    Generated with realistic French business context and intentional quality issues for educational purposes. All data is synthetic and does not represent real individuals or companies.

  4. Anonymization, Behavioral and Privacy User Profile

    • kaggle.com
    zip
    Updated Nov 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rasika Ekanayaka @ devLK (2024). Anonymization, Behavioral and Privacy User Profile [Dataset]. https://www.kaggle.com/datasets/rasikaekanayakadevlk/anonymization-behavioral-and-privacy-user-profile
    Explore at:
    zip(215585 bytes)Available download formats
    Dataset updated
    Nov 23, 2024
    Authors
    Rasika Ekanayaka @ devLK
    Description

    This dataset combines comprehensive data from multiple sources, providing an integrated view of encryption techniques, user behavior patterns, privacy measures, and updated user profiles. It is designed for applications in data privacy, behavioral analysis, and user management.

    1. Anonymization and Encryption Data:
      Details on encryption types, algorithms, key lengths, and associated timestamps.
      Useful for analyzing encryption standards and their effectiveness in anonymization.
    2. Behavioral Data Collection:
      Captures user behavior patterns, including types of behaviors, frequency, and duration.
      Includes timestamps for trend analysis and anomaly detection.
    3. Privacy Encryption Data:
      Provides information on privacy types, encryption levels, and additional metadata.
      Helps in evaluating the adequacy of privacy measures and encryption practices.
    4. Updated User ID Dataset:
      Contains updated user details, including unique IDs, names, phone numbers, and email addresses.
      Acts as a reference for linking user profiles to behavioral and encryption data.
    

    Applications:

      Data Privacy and Security: Analyze encryption algorithms and privacy measures to ensure data protection.
      Behavioral Analysis: Identify trends, patterns, and anomalies in user behavior over time.
      User Management: Utilize user profiles for linking behaviors and encryption activities to individual identities.
      Research and Development: Aid in developing robust systems for anonymization, privacy, and user analytics.
    

    This dataset is structured for multi-purpose use cases, making it a valuable resource for researchers, data analysts, and developers working on privacy, security, and behavioral systems.

  5. Social Media Engagement Report

    • kaggle.com
    zip
    Updated Apr 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Reda Elblgihy (2024). Social Media Engagement Report [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/social-media-engagement-report
    Explore at:
    zip(49114657 bytes)Available download formats
    Dataset updated
    Apr 13, 2024
    Authors
    Ali Reda Elblgihy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    *****Documentation Process***** 1. Data Preparation: - Upload the data into Power Query to assess quality and identify duplicate values, if any. - Verify data quality and types for each column, addressing any miswriting or inconsistencies. 2. Data Management: - Duplicate the original data sheet for future reference and label the new sheet as the "Working File" to preserve the integrity of the original dataset. 3. Understanding Metrics: - Clarify the meaning of column headers, particularly distinguishing between Impressions and Reach, and comprehend how Engagement Rate is calculated. - Engagement Rate formula: Total likes, comments, and shares divided by Reach. 4. Data Integrity Assurance: - Recognize that Impressions should outnumber Reach, reflecting total views versus unique audience size. - Investigate discrepancies between Reach and Impressions to ensure data integrity, identifying and resolving root causes for accurate reporting and analysis. 5. Data Correction: - Collaborate with the relevant team to rectify data inaccuracies, specifically addressing the discrepancy between Impressions and Reach. - Engage with the concerned team to understand the root cause of discrepancies between Impressions and Reach. - Identify instances where Impressions surpass Reach, potentially attributable to data transformation errors. - Following the rectification process, meticulously adjust the dataset to reflect the corrected Impressions and Reach values accurately. - Ensure diligent implementation of the corrections to maintain the integrity and reliability of the data. - Conduct a thorough recalculation of the Engagement Rate post-correction, adhering to rigorous data integrity standards to uphold the credibility of the analysis. 6. Data Enhancement: - Categorize Audience Age into three groups: "Senior Adults" (45+ years), "Mature Adults" (31-45 years), and "Adolescent Adults" (<30 years) within a new column named "Age Group." - Split date and time into separate columns using the text-to-columns option for improved analysis. 7. Temporal Analysis: - Introduce a new column for "Weekend and Weekday," renamed as "Weekday Type," to discern patterns and trends in engagement. - Define time periods by categorizing into "Morning," "Afternoon," "Evening," and "Night" based on time intervals. 8. Sentiment Analysis: - Populate blank cells in the Sentiment column with "Mixed Sentiment," denoting content containing both positive and negative sentiments or ambiguity. 9. Geographical Analysis: - Group countries and obtain additional continent data from an online source (e.g., https://statisticstimes.com/geography/countries-by-continents.php). - Add a new column for "Audience Continent" and utilize XLOOKUP function to retrieve corresponding continent data.

    *****Drawing Conclusions and Providing a Summary*****

    • The data is equally distributed across different categories, platforms, and over the years.
    • Most of our audience comprises senior adults (aged 45 and above).
    • Most of our audience exhibit mixed sentiments about our posts. However, an equal portion expresses consistent sentiments.
    • The majority of our posts were located in Africa.
    • The number of posts increased from the first year to the second year and remained relatively consistent for the third year.
    • The optimal time for posting is during the night on weekdays.
    • The highest engagement rates were observed in Croatia then Malawi.
    • The number of posts targeting senior adults is significantly higher than the other two categories. However, the engagement rates for mature and adolescent adults are also noteworthy, based on the number of targeted posts.
  6. d

    Data from: Types, levels, and patterns of low-copy DNA sequence divergence,...

    • datadryad.org
    • datasetcatalog.nlm.nih.gov
    • +2more
    zip
    Updated Oct 6, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew H. Paterson; Junkang Rong; Xiyin Wang; Stefan R. Schulze; Rosana O. Compton; T. D. Williams-Coplin; Valorie Goff; Peng W. Chee (2011). Types, levels, and patterns of low-copy DNA sequence divergence, and phylogenetic implications, for Gossypium genome types [Dataset]. http://doi.org/10.5061/dryad.fb5hk394
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 6, 2011
    Dataset provided by
    Dryad
    Authors
    Andrew H. Paterson; Junkang Rong; Xiyin Wang; Stefan R. Schulze; Rosana O. Compton; T. D. Williams-Coplin; Valorie Goff; Peng W. Chee
    Time period covered
    Oct 6, 2011
    Description

    Supplemental FiguresThis file includes three supplemental figures which are related to the paper. The figure legends are given below each figure.Spplemental Figure.docSpplemental Figure.pdfSupplemental TablesThis file includes four supplemental tables which are relevant to the paper. The supplemental table 1 lists the locus name, GenBank accession number, primer sequence and annealing temperature for all loci studied in this research. Supplemental table 2 is about the diversities revealed in the pair wise comparison including extended and single deletion, insertion, SNP, polymorphic sites, and polymorphic base pairs. Supplemental table 3 is about SNPs, polymorphic sites, and polymorphic base pairs between A and D ancestral genomes. Supplemental table 4 is about diversities revealed in the three way comparison including extended and single deletion, insertion, SNP, polymorphic sites, and polymorphic base pairs.

  7. Ship Fuel Consumption & CO2 Emissions Analysis

    • kaggle.com
    zip
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fijabi J. Adekunle (2024). Ship Fuel Consumption & CO2 Emissions Analysis [Dataset]. https://www.kaggle.com/datasets/jeleeladekunlefijabi/ship-fuel-consumption-and-co2-emissions-analysis/code
    Explore at:
    zip(30728 bytes)Available download formats
    Dataset updated
    Dec 15, 2024
    Authors
    Fijabi J. Adekunle
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset provides a detailed overview of fuel consumption and CO2 emissions for various ship types operating in Nigerian waterways. It includes data on ship types, routes, engine efficiency, fuel consumption, Month and emissions, making it suitable for environmental impact studies, maritime operations optimization, and predictive modeling.

    Documentation

    Documentation for Ship Fuel Consumption & CO2 Emission Analysis

    Overview

    This project analyzes fuel consumption and CO2 emissions of various ship types operating in Nigerian waterways. By exploring the fuel efficiency and environmental impact of these vessels, we aim to provide actionable insights for optimizing maritime operations and reducing emissions.

    Dataset Description

    The dataset used in this project contains the following columns: - Ship Type: Categorizes ships into four main types: Fishing Trawler, Oil Service Boat, Surfer Boat, and Tanker Ship. - Fuel Consumption (Liters): The total fuel consumed by each ship type during operations. CO2 Emission (Kg): The amount of carbon dioxide emitted based on fuel consumption. - Other Variables: Supporting data used for exploratory analysis. This dataset was generated to simulate realistic maritime operations in Nigeria, taking into account common ship types, fuel usage patterns, and emissions.

    Key Insights:

    1. Fuel Consumption by Ship Type:
    2. Tanker Ships have the highest average fuel consumption, reflecting their larger size and cargo capacity.
    3. Surfer Boats consume the least fuel, making them efficient for shorter, high-speed trips.
    4. CO2 Emissions: A strong positive correlation (r ≈ 0.997) exists between fuel consumption and CO2 emissions, confirming that higher fuel usage results in greater emissions.
    5. Statistical Analysis:
    6. ANOVA tests reveal significant differences in fuel consumption across ship types.
    7. Tukey HSD analysis pinpoints which ship types differ in fuel consumption and emissions, providing actionable insights. _ ### Future Work

    Predictive Analysis A machine learning model can be explored to predict fuel consumption and CO2 emissions based on ship type and operational factors. This would aid in forecasting and planning for greener maritime logistics.

    Visualization Key visualizations in this project include: - Bar Charts: Compare average fuel consumption and CO2 emissions across ship types. - Correlation Matrix: Highlights the strong relationship between fuel and emissions. - ANOVA Plots: Illustrate statistical differences between groups.

    Usage This dataset and project are valuable for: - Maritime operators looking to optimize fuel efficiency. - Environmental agencies monitoring CO2 emissions. - Data scientists exploring use cases in transportation and environmental sustainability.

    Files Included 1. Dataset: The raw data used for analysis. 2. Jupyter Notebook: Contains the complete code for data cleaning, analysis, and visualization. 3. Images: Realistic representations of the ship types analyzed. 4. Documentation: This document for reference.

    Acknowledgments This project was developed by ** FIJAB J. ADEKUNLE** as part of a portfolio project in data analysis. Special thanks to the Kaggle platform for hosting the dataset and analysis.

  8. Sample data (five types of features of one participant)

    • figshare.com
    txt
    Updated Mar 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hua Liao (2022). Sample data (five types of features of one participant) [Dataset]. http://doi.org/10.6084/m9.figshare.19443503.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 29, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Hua Liao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample data (five types of features of one participant)

  9. E

    New Oxford Dictionary of English, 2nd Edition

    • live.european-language-grid.eu
    • catalog.elra.info
    Updated Dec 6, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2005). New Oxford Dictionary of English, 2nd Edition [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/2276
    Explore at:
    Dataset updated
    Dec 6, 2005
    License

    http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    This is Oxford University Press's most comprehensive single-volume dictionary, with 170,000 entries covering all varieties of English worldwide. The NODE data set constitutes a fully integrated range of formal data types suitable for language engineering and NLP applications: It is available in XML or SGML. - Source dictionary data. The NODE data set includes all the information present in the New Oxford Dictionary of English itself, such as definition text, example sentences, grammatical indicators, and encyclopaedic material. - Morphological data. Each NODE lemma (both headwords and subentries) has a full listing of all possible syntactic forms (e.g. plurals for nouns, inflections for verbs, comparatives and superlatives for adjectives), tagged to show their syntactic relationships. Each form has an IPA pronunciation. Full morphological data is also given for spelling variants (e.g. typical American variants), and a system of links enables straightforward correlation of variant forms to standard forms. The data set thus provides robust support for all look-up routines, and is equally viable for applications dealing with American and British English. - Phrases and idioms. The NODE data set provides a rich and flexible codification of over 10,000 phrasal verbs and other multi-word phrases. It features comprehensive lexical resources enabling applications to identify a phrase not only in the form listed in the dictionary but also in a range of real-world variations, including alternative wording, variable syntactic patterns, inflected verbs, optional determiners, etc. - Subject classification. Using a categorization scheme of 200 key domains, over 80,000 words and senses have been associated with particular subject areas, from aeronautics to zoology. As well as facilitating the extraction of subject-specific sub-lexicons, this also provides an extensive resource for document categorization and information retrieval. - Semantic relationships. The relationships between every noun and noun sense in the dictionary are being codified using an extensive semantic taxonomy on the model of the Princeton WordNet project. (Mapping to WordNet 1.7 is supported.) This structure allows elements of the basic lexical database to function as a formal knowledge database, enabling functionality such as sense disambiguation and logical inference. - Derived from the detailed and authoritative corpus-based research of Oxford University Press's lexicographic team, the NODE data set is a powerful asset for any task dealing with real-world contemporary English usage. By integrating a number of different data types into a single structure, it creates a coherent resource which can be queried along numerous axes, allowing open-ended exploitation by many kinds of language-related applications.

  10. d

    Data from: Classification of crop types in central California from 2005 -...

    • catalog.data.gov
    • data.usgs.gov
    • +2more
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Classification of crop types in central California from 2005 - 2020 [Dataset]. https://catalog.data.gov/dataset/classification-of-crop-types-in-central-california-from-2005-2020
    Explore at:
    Dataset updated
    Oct 1, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Central California, California
    Description

    This dataset is support materials for the publication "Crop type classification, trends, and patterns of central California agricultural fields from 2005 – 2020". This data release is comprised of two child datasets. The first dataset, 'Labeled_CropType_Points', is a shapefile that consists of randomly selected point locations in which crop types were verified using high resolution imagery for each examined year across the study period (2005 - 2020). The second dataset, 'Central_CA_Classified_Croplands', is also a shapefile, but contains polygons of 9 classified crop types derived from a random forest machine learning classifier for central California for each examined year across the study period (2005 - 2020).

  11. f

    Data from: Characterization of the Different El Niño Types and their Impacts...

    • scielo.figshare.com
    jpeg
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juarez Viegas; Rita Valéria Andreoli; Mary Toshie Kayano; Luiz Antonio Candido; Rodrigo Augusto Ferreira de Souza; Denisi Holanda Hall; Aline Corrêa de Souza; Samia Regina Garcia; Gleice Guerreiro Temoteo; Wanda Isabella Diógenes Valentin (2023). Characterization of the Different El Niño Types and their Impacts in South America From Observed and Modeled Data [Dataset]. http://doi.org/10.6084/m9.figshare.8227220.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    SciELO journals
    Authors
    Juarez Viegas; Rita Valéria Andreoli; Mary Toshie Kayano; Luiz Antonio Candido; Rodrigo Augusto Ferreira de Souza; Denisi Holanda Hall; Aline Corrêa de Souza; Samia Regina Garcia; Gleice Guerreiro Temoteo; Wanda Isabella Diógenes Valentin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South America
    Description

    Abstract Recent studies have pointed out to the existence of two El Niño (EN) types: Eastern Pacific or Canonical (EP) EN and Central Pacific or Modoki (CP) EN. In the present study, the observed and simulated data in three models of the Coupled Model Intercomparison Project phase 5 (CMIP5) were used to evaluate the impacts of two EN types on the South American precipitation from June-August of the EN onset year to March-May of the following year. The Centre National de Recherches Météorologiques (CNRM-CM5) model presented a better performance in reproducing the observed SST anomaly patterns for the CP and EP EN types. The observed precipitation anomaly pattern associated with the EN events was better represented during the austral summer. In the case of the EP EN, such pattern features wetness (dryness) in southeastern (northern-northwestern) South America. The CNRM-CM5 and Hadley Centre Global Environmental Model (HadGEM2-ES) models reproduced this pattern. The Max Planck Institute Earth System Model (MPI-ESM-LR) model reproduced the dryness over northern, but not the rainfall increasing in southeastern and the rainfall reduction in northwestern of the continent. In the case of the CP EN, the observed impact on the South American rainfall during the austral summer featured rainfall scarcity (excess) in northern and northwestern (southeastern) South America. The models reproduced this pattern, however, the HadGEM2-ES and MPI-ESM-LR models showed lower rainfall over northeastern Brazil than the observed one. The EN teleconnection differences explain the differences of the simulated patterns.

  12. d

    Data from: Patterns and limitations of urban human mobility resilience under...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jan 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qi Wang; John E. Taylor (2017). Patterns and limitations of urban human mobility resilience under the influence of multiple types of natural disaster [Dataset]. http://doi.org/10.5061/dryad.88354
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 8, 2017
    Dataset provided by
    Dryad
    Authors
    Qi Wang; John E. Taylor
    Time period covered
    Jan 8, 2016
    Area covered
    World
    Description

    Patterns and Limitations of Urban Human Mobility Resilience under the Influence of Multiple Types of Natural Disaster (Original Data)The file includes the location data from 15 natural disaster events that are used for this research.

  13. A description of the simulated point patterns used to test the algorithm.

    • plos.figshare.com
    xls
    Updated Jun 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Malleson; Wouter Steenbeek; Martin A. Andresen (2023). A description of the simulated point patterns used to test the algorithm. [Dataset]. http://doi.org/10.1371/journal.pone.0218324.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 20, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nick Malleson; Wouter Steenbeek; Martin A. Andresen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The parameter a is chosen so that each point pattern contains approximately 3,000 points. Parameters b and c determine the amount of clustering; larger numbers produce more clustering.

  14. Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

    • data.europa.eu
    • zenodo.org
    unknown
    Updated Jul 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-6832242?locale=fr
    Explore at:
    unknown(642961582)Available download formats
    Dataset updated
    Jul 12, 2022
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LifeSnaps Dataset Documentation Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction. The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication. Data Import: Reading CSV For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command. Data Import: Setting up a MongoDB (Recommended) To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database. To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here. For the Fitbit data, run the following: mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

  15. B

    Replication Data for: The influence of hand depiction types on behavioural...

    • borealisdata.ca
    Updated Nov 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Marotta (2025). Replication Data for: The influence of hand depiction types on behavioural patterns in laterality judgments. [Dataset]. http://doi.org/10.5683/SP3/AOSX2X
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2025
    Dataset provided by
    Borealis
    Authors
    Jonathan Marotta
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    According to the Motor Simulation Theory, cognitive states such as kinesthetic motor imagery activate the motor system in a similar way to overt motor execution. Action simulation involved in motor imagery can be implicitly triggered when individuals unconsciously simulate an action, as is the case in Hand Laterality Judgement Task (HLJT). Studies employing the HLJT often use various depictions of hands, which may potentially influence behavioural measures such as response times. The present study recruited 70 younger adults who mentally simulated both realistic and line drawing representations of hands using the HLJT. The results indicated that (1) mental transformations were quicker with line drawing depictions than with realistic hands, (2) faster response times were observed for the back of the hand compared to the palm, and (3) when comparing line drawings to real hands, quicker response times were noted for 0° and 90°L orientations. The results suggest that when compared to line drawings, realistic hands have slower response times for both simple (0°) and challenging (90°L) mental transformations. Overall, behavioural measures may vary between realistic hands and line drawings, underscoring the importance of considering this distinction when utilizing the HLJT.

  16. c

    Data for graph Illus. 4.31. Pottery deposition patterns in different context...

    • repository.cam.ac.uk
    ods
    Updated Nov 10, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evans, Jeremy; Mills, Philip (2017). Data for graph Illus. 4.31. Pottery deposition patterns in different context types (for comparison with Hayton/Shiptonthorpe) by Phase, shown by relative frequencies of sherds. [Dataset]. http://doi.org/10.17863/CAM.14508
    Explore at:
    ods(3739 bytes)Available download formats
    Dataset updated
    Nov 10, 2017
    Dataset provided by
    Apollo
    University of Cambridge
    Authors
    Evans, Jeremy; Mills, Philip
    License

    https://www.rioxx.net/licenses/all-rights-reserved/https://www.rioxx.net/licenses/all-rights-reserved/

    Description

    Data for graph Illus. 4.31. Pottery deposition patterns in different context types (for comparison with Hayton/Shiptonthorpe) by Phase, shown by relative frequencies of sherds. Context types used are those in common with Hayton and Shiptonthorpe publications.

  17. U

    Sagebrush Types, Soil Regime Classes, and Fire Frequencies in Greater...

    • data.usgs.gov
    • datadiscoverystudio.org
    • +3more
    Updated May 20, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Brooks; John Matchett (2016). Sagebrush Types, Soil Regime Classes, and Fire Frequencies in Greater Sage-grouse Population Areas of the Colorado Plateau (1984-2013) [Dataset]. http://doi.org/10.5066/F76971N5
    Explore at:
    Dataset updated
    May 20, 2016
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Matthew Brooks; John Matchett
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jan 1, 1984 - Dec 31, 2013
    Area covered
    Colorado Plateau
    Description

    This three-band, 30-m resolution raster contains sagebrush vegetation types, soil temperature/moisture regime classes, and large fire frequencies across greater sage-grouse population areas within the Colorado Plateau sage-grouse management zone. Sagebrush vegetation types were defined by grouping together similar vegetation types from the LANDFIRE biophysical settings layer. Soil moisture and temperature regimes were from an USDA-NRCS analysis of soil types across the greater sage-grouse range. Fire frequencies were derived from fire severity rasters created by the Monitoring Trends in Burn Severity program. The area of analysis included the greater sage-grouse populations areas within specific management zones. Methods used to derive these data are detailed in the report [Brooks, M.L., Matchett, J.R., Shinneman, D.J., and Coates, P.S., 2015, Fire patterns in the range of greater sage-grouse, 1984-2013; Implications for conservation and management: U.S. Geological Survey Open-Fil ...

  18. n

    Data from: Response of soil fungal communities and their co-occurrence...

    • data-staging.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anjing Jiang (2024). Response of soil fungal communities and their co-occurrence patterns to grazing exclusion in different grassland types [Dataset]. http://doi.org/10.5061/dryad.bcc2fqzn0
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    Xinjiang Agricultural University
    Authors
    Anjing Jiang
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Overgrazing and climate change are the main causes of grassland degradation, and grazing exclusion is one of the most common measures for restoring degraded grasslands worldwide. Soil fungi can respond rapidly to environmental stresses, but the response of different grassland types to grazing control has not been uniformly determined. Three grassland types (temperate desert, temperate steppe grassland, and mountain meadow) that were closed for grazing exclusion for nine years were used to study the effects of grazing exclusion on soil nutrients as well as fungal community structure in the three grassland types. The results showed that (1) in the 0–5 cm soil layer, grazing exclusion significantly affected the soil water content of the three grassland types (P<0.05), and the pH, total phosphorous (TP) and nitrogen-to-phosphorous ratio (N/P) changed significantly in all three grassland types (P<0.05). Significant changes in soil nutrients in the 5–10 cm soil layer after grazing exclusion occurred in the mountain meadow grasslands (P<0.05), but not in the temperate desert and temperate steppe grasslands. (2) For the different grassland types, Archaeorhizomycetes was most abundant in the montane meadows, and Dothideomycetes was most abundant in the temperate desert grasslands and was significantly more abundant than in the remaining two grassland types (P<0.05). Grazing exclusion let to insignificant changes in the dominant soil fungal phyla and in α diversity but significant changes in the β diversity of soil fungi (P<0.05). (3) Grazing exclusion areas have higher mean clustering coefficients and modularity classes than grazing areas. In particular, the highest modularity class is found in temperate steppe grassland grazing exclusion areas. (4) We also found that pH is the main driving factor affecting soil fungal community structure, that plant coverage is a key environmental factor affecting soil community composition, and that grazing exclusion indirectly affects soil fungal communities by affecting soil nutrients. The above results suggest that grazing exclusion may regulate microbial ecological processes by changing the soil fungal β diversity in the three grassland types. Grazing exclusion is not conducive to the recovery of soil nutrients in areas with mountain meadow but improves the stability of soil fungi in temperate steppe grassland. Therefore, the type of degraded grassland should be considered when formulating suitable restoration programmes when grazing exclusion measures are implemented. The results of this study provide new insights into the response of soil fungal communities to grazing exclusion, providing a theoretical basis for the management of degraded grassland restoration.

  19. f

    Data from: Pattern Formation and Self-Organization in a Simple Precipitation...

    • acs.figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    András Volford; Ferenc Izsák; Mátyás Ripszám; István Lagzi (2023). Pattern Formation and Self-Organization in a Simple Precipitation System [Dataset]. http://doi.org/10.1021/la0623432.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    ACS Publications
    Authors
    András Volford; Ferenc Izsák; Mátyás Ripszám; István Lagzi
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Various types of pattern formation and self-organization phenomena can be observed in biological, chemical, and geochemical systems due to the interaction of reaction with diffusion. The appearance of static precipitation patterns was reported first by Liesegang in 1896. Traveling waves and dynamically changing patterns can also exist in reaction−diffusion systems:  the Belousov−Zhabotinsky reaction provides a classical example for these phenomena. Until now, no experimental evidence had been found for the presence of such dynamical patterns in precipitation systems. Pattern formation phenomena, as a result of precipitation front coupling with traveling waves, are investigated in a new simple reaction−diffusion system that is based on the precipitation and complex formation of aluminum hydroxide. A unique kind of self-organization, the spontaneous appearance of traveling waves, and spiral formation inside a precipitation front is reported. The newly designed system is a simple one (we need just two inorganic reactants, and the experimental setup is simple), in which dynamically changing pattern formation can be observed. This work could show a new perspective in precipitation pattern formation and geochemical self-organization.

  20. f

    Data from: High-Resolution Maps of Material Stocks in Buildings and...

    • acs.figshare.com
    xlsx
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Helmut Haberl; Dominik Wiedenhofer; Franz Schug; David Frantz; Doris Virág; Christoph Plutzar; Karin Gruhler; Jakob Lederer; Georg Schiller; Tomer Fishman; Maud Lanau; Andreas Gattringer; Thomas Kemper; Gang Liu; Hiroki Tanikawa; Sebastian van der Linden; Patrick Hostert (2023). High-Resolution Maps of Material Stocks in Buildings and Infrastructures in Austria and Germany [Dataset]. http://doi.org/10.1021/acs.est.0c05642.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    ACS Publications
    Authors
    Helmut Haberl; Dominik Wiedenhofer; Franz Schug; David Frantz; Doris Virág; Christoph Plutzar; Karin Gruhler; Jakob Lederer; Georg Schiller; Tomer Fishman; Maud Lanau; Andreas Gattringer; Thomas Kemper; Gang Liu; Hiroki Tanikawa; Sebastian van der Linden; Patrick Hostert
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Germany, Austria
    Description

    The dynamics of societal material stocks such as buildings and infrastructures and their spatial patterns drive surging resource use and emissions. Two main types of data are currently used to map stocks, night-time lights (NTL) from Earth-observing (EO) satellites and cadastral information. We present an alternative approach for broad-scale material stock mapping based on freely available high-resolution EO imagery and OpenStreetMap data. Maps of built-up surface area, building height, and building types were derived from optical Sentinel-2 and radar Sentinel-1 satellite data to map patterns of material stocks for Austria and Germany. Using material intensity factors, we calculated the mass of different types of buildings and infrastructures, distinguishing eight types of materials, at 10 m spatial resolution. The total mass of buildings and infrastructures in 2018 amounted to ∼5 Gt in Austria and ∼38 Gt in Germany (AT: ∼540 t/cap, DE: ∼450 t/cap). Cross-checks with independent data sources at various scales suggested that the method may yield more complete results than other data sources but could not rule out possible overestimations. The method yields thematic differentiations not possible with NTL, avoids the use of costly cadastral data, and is suitable for mapping larger areas and tracing trends over time.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nick Malleson; Wouter Steenbeek; Martin A. Andresen (2023). Identifying the appropriate spatial resolution for the analysis of crime patterns [Dataset]. http://doi.org/10.1371/journal.pone.0218324
Organization logo

Identifying the appropriate spatial resolution for the analysis of crime patterns

Explore at:
16 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Nick Malleson; Wouter Steenbeek; Martin A. Andresen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

BackgroundA key issue in the analysis of many spatial processes is the choice of an appropriate scale for the analysis. Smaller geographical units are generally preferable for the study of human phenomena because they are less likely to cause heterogeneous groups to be conflated. However, it can be harder to obtain data for small units and small-number problems can frustrate quantitative analysis. This research presents a new approach that can be used to estimate the most appropriate scale at which to aggregate point data to areas.Data and methodsThe proposed method works by creating a number of regular grids with iteratively smaller cell sizes (increasing grid resolution) and estimating the similarity between two realisations of the point pattern at each resolution. The method is applied first to simulated point patterns and then to real publicly available crime data from the city of Vancouver, Canada. The crime types tested are residential burglary, commercial burglary, theft from vehicle and theft of bike.FindingsThe results provide evidence for the size of spatial unit that is the most appropriate for the different types of crime studied. Importantly, the results are dependent on both the number of events in the data and the degree of spatial clustering, so a single ‘appropriate’ scale is not identified. The method is nevertheless useful as a means of better estimating what spatial scale might be appropriate for a particular piece of analysis.

Search
Clear search
Close search
Google apps
Main menu