57 datasets found
  1. Data from: World Database on Protected Areas

    • kiribati-data.sprep.org
    • pacificdata.org
    • +13more
    jpg, pdf
    Updated Feb 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Secretariat of the Pacific Regional Environment Programme (2025). World Database on Protected Areas [Dataset]. https://kiribati-data.sprep.org/dataset/world-database-protected-areas-0
    Explore at:
    jpg(577876), pdf(2100272)Available download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Pacific Regional Environment Programmehttps://www.sprep.org/
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    POLYGON ((205.20703554153 -28.505385171432, 129.26953554153 1.4588219018416, 204.92578983307 6.2279312638895)), 167.23828554153 25.085596467854, 141.3632941246 -0.22851555560937, 129.26953554153 27.605668449605, 160.20703554153 -29.489341672009, 141.92578554153 -11.126668087769, Pacific Region
    Description

    The World Database on Protected Areas (WDPA) is the most comprehensive global database of marine and terrestrial protected areas, updated on a monthly basis, and is one of the key global biodiversity data sets being widely used by scientists, businesses, governments, International secretariats and others to inform planning, policy decisions and management. The WDPA is a joint project between UN Environment and the International Union for Conservation of Nature (IUCN). The compilation and management of the WDPA is carried out by UN Environment World Conservation Monitoring Centre (UNEP-WCMC), in collaboration with governments, non-governmental organisations, academia and industry. There are monthly updates of the data which are made available online through the Protected Planet website where the data is both viewable and downloadable. Data and information on the world's protected areas compiled in the WDPA are used for reporting to the Convention on Biological Diversity on progress towards reaching the Aichi Biodiversity Targets (particularly Target 11), to the UN to track progress towards the 2030 Sustainable Development Goals, to some of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) core indicators, and other international assessments and reports including the Global Biodiversity Outlook, as well as for the publication of the United Nations List of Protected Areas. Every two years, UNEP-WCMC releases the Protected Planet Report on the status of the world's protected areas and recommendations on how to meet international goals and targets. Many platforms are incorporating the WDPA to provide integrated information to diverse users, including businesses and governments, in a range of sectors including mining, oil and gas, and finance. For example, the WDPA is included in the Integrated Biodiversity Assessment Tool, an innovative decision support tool that gives users easy access to up-to-date information that allows them to identify biodiversity risks and opportunities within a project boundary. The reach of the WDPA is further enhanced in services developed by other parties, such as the Global Forest Watch and the Digital Observatory for Protected Areas, which provide decision makers with access to monitoring and alert systems that allow whole landscapes to be managed better. Together, these applications of the WDPA demonstrate the growing value and significance of the Protected Planet initiative.

  2. Global Green Economy Index (GGEI)

    • kaggle.com
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeremy Tamanini (2024). Global Green Economy Index (GGEI) [Dataset]. https://www.kaggle.com/datasets/jeremytamanini/global-green-economy-index-ggei
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 8, 2024
    Dataset provided by
    Kaggle
    Authors
    Jeremy Tamanini
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    For the first time, the full results from the Global Green Economy Index (GGEI) are available in the public domain. Historically, only the aggregate results have been publicly accessible. The full dataset has been paywalled and accessible to our subscribers only. But the way in which we release GGEI data to the public is changing. Read on for a quick explanation for how and why.

    First, the how. The GGEI file publicly accessible today represents that dataset officially compiled in 2022. It contains the full results for each of the 18 indicators in the GGEI for 160 countries, across the four main dimensions of climate change & social equity, sector decarbonization, markets & ESG investment and the environment. Some (not all) of these data points have since been updated, as new datasets have been published. The GGEI is a dynamic model, updating in real-time as new data becomes available. Our subscribing clients will still receive this most timely version of the model, along with any customizations they may request.

    Now, the why. First and foremost, there is huge demand among academic researchers globally for the full GGEI dataset. Academic inquiry around the green transition, sustainable development, ESG investing, and green energy systems has exploded over the past several years. We receive hundreds of inquiries annually from these students and researchers to access the full GGEI dataset. Making it publicly accessible as we are today makes it easier for these individuals and institutions to use these GGEI to promote learning and green progress within their institutions.

    More broadly, the landscape for data has changed significantly. A decade ago when the GGEI was first published, datasets existed more in silos and users might subscribe to one specific dataset like the GGEI to answer a specific question. But today, data usage in the sustainability space has become much more of a system, whereby myriad data sources are synthesized into increasingly sophisticated models, often fueled by artificial intelligence. Making the GGEI more accessible will accelerate how this perspective on the global green economy can be integrated to these systems.

  3. d

    Protected Areas Database of the United States (PAD-US) 2.1 - World Database...

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 2.1 - World Database on Protected Areas (WDPA) Submission (ver. 1.1, April 2021) [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-2-1-world-database-on-protected-areas
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    The United States Geological Survey (USGS) - Science Analytics and Synthesis (SAS) - Gap Analysis Project (GAP) manages the Protected Areas Database of the United States (PAD-US), an Arc10x geodatabase, that includes a full inventory of areas dedicated to the preservation of biological diversity and to other natural, recreation, historic, and cultural uses, managed for these purposes through legal or other effective means (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/protected-areas). The PAD-US is developed in partnership with many organizations, including coordination groups at the [U.S.] Federal level, lead organizations for each State, and a number of national and other non-governmental organizations whose work is closely related to the PAD-US. Learn more about the USGS PAD-US partners program here: www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-data-stewards. The United Nations Environmental Program - World Conservation Monitoring Centre (UNEP-WCMC) tracks global progress toward biodiversity protection targets enacted by the Convention on Biological Diversity (CBD) through the World Database on Protected Areas (WDPA) and World Database on Other Effective Area-based Conservation Measures (WD-OECM) available at: www.protectedplanet.net. See the Aichi Target 11 dashboard (www.protectedplanet.net/en/thematic-areas/global-partnership-on-aichi-target-11) for official protection statistics recognized globally and developed for the CBD, or here for more information and statistics on the United States of America's protected areas: www.protectedplanet.net/country/USA. It is important to note statistics published by the National Oceanic and Atmospheric Administration (NOAA) Marine Protected Areas (MPA) Center (www.marineprotectedareas.noaa.gov/dataanalysis/mpainventory/) and the USGS-GAP (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-statistics-and-reports) differ from statistics published by the UNEP-WCMC as methods to remove overlapping designations differ slightly and U.S. Territories are reported separately by the UNEP-WCMC (e.g. The largest MPA, "Pacific Remote Islands Marine Monument" is attributed to the United States Minor Outlying Islands statistics). At the time of PAD-US 2.1 publication (USGS-GAP, 2020), NOAA reported 26% of U.S. marine waters (including the Great Lakes) as protected in an MPA that meets the International Union for Conservation of Nature (IUCN) definition of biodiversity protection (www.iucn.org/theme/protected-areas/about). USGS-GAP plans to publish PAD-US 2.1 Statistics and Reports in the spring of 2021. The relationship between the USGS, the NOAA, and the UNEP-WCMC is as follows: - USGS manages and publishes the full inventory of U.S. marine and terrestrial protected areas data in the PAD-US representing many values, developed in collaboration with a partnership network in the U.S. and; - USGS is the primary source of U.S. marine and terrestrial protected areas data for the WDPA, developed from a subset of the PAD-US in collaboration with the NOAA, other agencies and non-governmental organizations in the U.S., and the UNEP-WCMC and; - UNEP-WCMC is the authoritative source of global protected area statistics from the WDPA and WD-OECM and; - NOAA is the authoritative source of MPA data in the PAD-US and MPA statistics in the U.S. and; - USGS is the authoritative source of PAD-US statistics (including areas primarily managed for biodiversity, multiple uses including natural resource extraction, and public access). The PAD-US 2.1 Combined Marine, Fee, Designation, Easement feature class (GAP Status Code 1 and 2 only) is the source of protected areas data in this WDPA update. Tribal areas and military lands represented in the PAD-US Proclamation feature class as GAP Status Code 4 (no known mandate for biodiversity protection) are not included as spatial data to represent internal protected areas are not available at this time. The USGS submitted more than 42,900 protected areas from PAD-US 2.1, including all 50 U.S. States and 6 U.S. Territories, to the UNEP-WCMC for inclusion in the May 2021 WDPA, available at www.protectedplanet.net. The NOAA is the sole source of MPAs in PAD-US and the National Conservation Easement Database (NCED, www.conservationeasement.us/) is the source of conservation easements. The USGS aggregates authoritative federal lands data directly from managing agencies for PAD-US (www.communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/), while a network of State data-stewards provide state, local government lands, and some land trust preserves. National nongovernmental organizations contribute spatial data directly (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-data-stewards). The USGS translates the biodiversity focused subset of PAD-US into the WDPA schema (UNEP-WCMC, 2019) for efficient aggregation by the UNEP-WCMC. The USGS maintains WDPA Site Identifiers (WDPAID, WDPA_PID), a persistent identifier for each protected area, provided by UNEP-WCMC. Agency partners are encouraged to track WDPA Site Identifier values in source datasets to improve the efficiency and accuracy of PAD-US and WDPA updates. The IUCN protected areas in the U.S. are managed by thousands of agencies and organizations across the country and include over 42,900 designated sites such as National Parks, National Wildlife Refuges, National Monuments, Wilderness Areas, some State Parks, State Wildlife Management Areas, Local Nature Preserves, City Natural Areas, The Nature Conservancy and other Land Trust Preserves, and Conservation Easements. The boundaries of these protected places (some overlap) are represented as polygons in the PAD-US, along with informative descriptions such as Unit Name, Manager Name, and Designation Type. As the WDPA is a global dataset, their data standards (UNEP-WCMC 2019) require simplification to reduce the number of records included, focusing on the protected area site name and management authority as described in the Supplemental Information section in this metadata record. Given the numerous organizations involved, sites may be added or removed from the WDPA between PAD-US updates. These differences may reflect actual change in protected area status; however, they also reflect the dynamic nature of spatial data or Geographic Information Systems (GIS). Many agencies and non-governmental organizations are working to improve the accuracy of protected area boundaries, the consistency of attributes, and inventory completeness between PAD-US updates. In addition, USGS continually seeks partners to review and refine the assignment of conservation measures in the PAD-US.

  4. D

    Data for "Prediction of Search Targets From Fixations in Open-World...

    • darus.uni-stuttgart.de
    Updated Oct 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Bulling (2022). Data for "Prediction of Search Targets From Fixations in Open-World Settings" [Dataset]. http://doi.org/10.18419/DARUS-3226
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 28, 2022
    Dataset provided by
    DaRUS
    Authors
    Andreas Bulling
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    World
    Dataset funded by
    DFG
    Cluster of Excellence on Multimodal Computing and Interaction (MMCI) at Saarland University
    Description

    We designed a human study to collect fixation data during visual search. We opted for a task that involved searching for a single image (the target) within a synthesised collage of images (the search set). Each of the collages are the random permutation of a finite set of images. To explore the impact of the similarity in appearance between target and search set on both fixation behaviour and automatic inference, we have created three different search tasks covering a range of similarities. In prior work, colour was found to be a particularly important cue for guiding search to targets and target-similar objects. Therefore we have selected for the first task 78 coloured O'Reilly book covers to compose the collages. These covers show a woodcut of an animal at the top and the title of the book in a characteristic font underneath. Given that overall cover appearance was very similar, this task allows us to analyse fixation behaviour when colour is the most discriminative feature. For the second task we use a set of 84 book covers from Amazon. In contrast to the first task, appearance of these covers is more diverse. This makes it possible to analyse fixation behaviour when both structure and colour information could be used by participants to find the target. Finally, for the third task, we use a set of 78 mugshots from a public database of suspects. In contrast to the other tasks, we transformed the mugshots to grey-scale so that they did not contain any colour information. In this case, allows abalysis of fixation behaviour when colour information was not available at all. We found faces to be particularly interesting given the relevance of searching for faces in many practical applications. 18 participants (9 males), age 18-30 Gaze data recorded with a stationary Tobii TX300 eye tracker More information about the dataset can be found in the README file.

  5. w

    Learning Poverty Global Database

    • data360.worldbank.org
    Updated Apr 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Learning Poverty Global Database [Dataset]. https://data360.worldbank.org/en/dataset/WB_LPGD
    Explore at:
    Dataset updated
    Apr 18, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2001 - 2023
    Description

    Will all children be able to read by 2030? The ability to read with comprehension is a foundational skill that every education system around the world strives to impart by late in primary school—generally by age 10. Moreover, attaining the ambitious Sustainable Development Goals (SDGs) in education requires first achieving this basic building block, and so does improving countries’ Human Capital Index scores. Yet past evidence from many low- and middle-income countries has shown that many children are not learning to read with comprehension in primary school. To understand the global picture better, we have worked with the UNESCO Institute for Statistics (UIS) to assemble a new dataset with the most comprehensive measures of this foundational skill yet developed, by linking together data from credible cross-national and national assessments of reading. This dataset covers 115 countries, accounting for 81% of children worldwide and 79% of children in low- and middle-income countries. The new data allow us to estimate the reading proficiency of late-primary-age children, and we also provide what are among the first estimates (and the most comprehensive, for low- and middle-income countries) of the historical rate of progress in improving reading proficiency globally (for the 2000-17 period). The results show that 53% of all children in low- and middle-income countries cannot read age-appropriate material by age 10, and that at current rates of improvement, this “learning poverty” rate will have fallen only to 43% by 2030. Indeed, we find that the goal of all children reading by 2030 will be attainable only with historically unprecedented progress. The high rate of “learning poverty” and slow progress in low- and middle-income countries is an early warning that all the ambitious SDG targets in education (and likely of social progress) are at risk. Based on this evidence, we suggest a new medium-term target to guide the World Bank’s work in low- and middle- income countries: cut learning poverty by at least half by 2030. This target, together with improved measurement of learning, can be as an evidence-based tool to accelerate progress to get all children reading by age 10.

    For further details, please refer to https://thedocs.worldbank.org/en/doc/e52f55322528903b27f1b7e61238e416-0200022022/original/Learning-poverty-report-2022-06-21-final-V7-0-conferenceEdition.pdf

  6. f

    Datasets used for model learning and validation.

    • figshare.com
    xls
    Updated Mar 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jialiang Sun; Jun Guo; Jian Liu (2024). Datasets used for model learning and validation. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011972.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 26, 2024
    Dataset provided by
    PLOS Computational Biology
    Authors
    Jialiang Sun; Jun Guo; Jian Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Using the CRISPR-Cas9 system to perform base substitutions at the target site is a typical technique for genome editing with the potential for applications in gene therapy and agricultural productivity. When the CRISPR-Cas9 system uses guide RNA to direct the Cas9 endonuclease to the target site, it may misdirect it to a potential off-target site, resulting in an unintended genome editing. Although several computational methods have been proposed to predict off-target effects, there is still room for improvement in the off-target effect prediction capability. In this paper, we present an effective approach called CRISPR-M with a new encoding scheme and a novel multi-view deep learning model to predict the sgRNA off-target effects for target sites containing indels and mismatches. CRISPR-M takes advantage of convolutional neural networks and bidirectional long short-term memory recurrent neural networks to construct a three-branch network towards multi-views. Compared with existing methods, CRISPR-M demonstrates significant performance advantages running on real-world datasets. Furthermore, experimental analysis of CRISPR-M under multiple metrics reveals its capability to extract features and validates its superiority on sgRNA off-target effect predictions.

  7. Data from: FISBe: A real-world benchmark dataset for instance segmentation...

    • zenodo.org
    • data.niaid.nih.gov
    bin, json +3
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisa Mais; Lisa Mais; Peter Hirsch; Peter Hirsch; Claire Managan; Claire Managan; Ramya Kandarpa; Josef Lorenz Rumberger; Josef Lorenz Rumberger; Annika Reinke; Annika Reinke; Lena Maier-Hein; Lena Maier-Hein; Gudrun Ihrke; Gudrun Ihrke; Dagmar Kainmueller; Dagmar Kainmueller; Ramya Kandarpa (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. http://doi.org/10.5281/zenodo.10875063
    Explore at:
    zip, text/x-python, bin, json, txtAvailable download formats
    Dataset updated
    Apr 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lisa Mais; Lisa Mais; Peter Hirsch; Peter Hirsch; Claire Managan; Claire Managan; Ramya Kandarpa; Josef Lorenz Rumberger; Josef Lorenz Rumberger; Annika Reinke; Annika Reinke; Lena Maier-Hein; Lena Maier-Hein; Gudrun Ihrke; Gudrun Ihrke; Dagmar Kainmueller; Dagmar Kainmueller; Ramya Kandarpa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 26, 2024
    Description

    General

    For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

    Summary

    • A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains
      • 30 completely labeled (segmented) images
      • 71 partly labeled images
      • altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)
    • To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects
    • A set of metrics and a novel ranking score for respective meaningful method benchmarking
    • An evaluation of three baseline methods in terms of the above metrics and score

    Abstract

    Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

    Dataset documentation:

    We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

    >> FISBe Datasheet

    Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

    Files

    • fisbe_v1.0_{completely,partly}.zip
      • contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.
    • fisbe_v1.0_mips.zip
      • maximum intensity projections of all samples, for convenience.
    • sample_list_per_split.txt
      • a simple list of all samples and the subset they are in, for convenience.
    • view_data.py
      • a simple python script to visualize samples, see below for more information on how to use it.
    • dim_neurons_val_and_test_sets.json
      • a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.
    • Readme.md
      • general information

    How to work with the image files

    Each sample consists of a single 3d MCFO image of neurons of the fruit fly.
    For each image, we provide a pixel-wise instance segmentation for all separable neurons.
    Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").
    The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.
    The segmentation mask for each neuron is stored in a separate channel.
    The order of dimensions is CZYX.

    We recommend to work in a virtual environment, e.g., by using conda:

    conda create -y -n flylight-env -c conda-forge python=3.9
    conda activate flylight-env

    How to open zarr files

    1. Install the python zarr package:
      pip install zarr
    2. Opened a zarr file with:

      import zarr
      raw = zarr.open(
      seg = zarr.open(

      # optional:
      import numpy as np
      raw_np = np.array(raw)

    Zarr arrays are read lazily on-demand.
    Many functions that expect numpy arrays also work with zarr arrays.
    Optionally, the arrays can also explicitly be converted to numpy arrays.

    How to view zarr image files

    We recommend to use napari to view the image data.

    1. Install napari:
      pip install "napari[all]"
    2. Save the following Python script:

      import zarr, sys, napari

      raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")
      gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

      viewer = napari.Viewer(ndisplay=3)
      for idx, gt in enumerate(gts):
      viewer.add_labels(
      gt, rendering='translucent', blending='additive', name=f'gt_{idx}')
      viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')
      viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')
      viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')
      napari.run()

    3. Execute:
      python view_data.py 

    Metrics

    • S: Average of avF1 and C
    • avF1: Average F1 Score
    • C: Average ground truth coverage
    • clDice_TP: Average true positives clDice
    • FS: Number of false splits
    • FM: Number of false merges
    • tp: Relative number of true positives

    For more information on our selected metrics and formal definitions please see our paper.

    Baseline

    To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..
    For detailed information on the methods and the quantitative results please see our paper.

    License

    The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

    Citation

    If you use FISBe in your research, please use the following BibTeX entry:

    @misc{mais2024fisbe,
     title =    {FISBe: A real-world benchmark dataset for instance
             segmentation of long-range thin filamentous structures},
     author =    {Lisa Mais and Peter Hirsch and Claire Managan and Ramya
             Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena
             Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller},
     year =     2024,
     eprint =    {2404.00130},
     archivePrefix ={arXiv},
     primaryClass = {cs.CV}
    }

    Acknowledgments

    We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuable
    discussions.
    P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.
    This work was co-funded by Helmholtz Imaging.

    Changelog

    There have been no changes to the dataset so far.
    All future change will be listed on the changelog page.

    Contributing

    If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

    All contributions are welcome!

  8. On Premises Real Time Database Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). On Premises Real Time Database Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/on-premises-real-time-database-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    On Premises Real Time Database Market Outlook



    The global market size for on-premises real-time database solutions was estimated at USD 12.3 billion in 2023, and it is projected to reach USD 25.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 8.6% during the forecast period. This growth is driven by several factors, including the increasing need for efficient data management and real-time data analytics capabilities across various industry verticals such as BFSI, healthcare, retail, and manufacturing.



    One of the primary growth factors for the on-premises real-time database market is the increasing volume of data generated by organizations. With the proliferation of IoT devices, social media platforms, and e-commerce activities, the amount of data generated is growing exponentially. Organizations are increasingly looking for robust database solutions that can handle real-time data processing and analytics to gain actionable insights and maintain a competitive edge. This trend is particularly evident in sectors like retail and manufacturing, where real-time data can significantly enhance operational efficiency and customer experience.



    Another significant growth driver is the need for enhanced data security and compliance. While cloud-based solutions offer scalability and flexibility, many organizations, particularly in the BFSI and healthcare sectors, prefer on-premises databases due to stringent data security and compliance requirements. On-premises solutions provide organizations with greater control over their data, allowing them to implement tailored security measures and ensure compliance with industry-specific regulations such as GDPR, HIPAA, and others. This increased focus on data security is likely to continue driving the demand for on-premises real-time database solutions.



    The technological advancements in database management systems are also propelling market growth. Innovations such as in-memory databases, multi-model databases, and enhanced query processing capabilities are enabling organizations to achieve faster data retrieval and improved performance. Additionally, the integration of artificial intelligence and machine learning algorithms in database systems is providing advanced analytics capabilities, further enhancing the value proposition of on-premises real-time databases. These technological advancements are expected to attract more organizations to invest in on-premises solutions.



    Operational Database Management System (ODBMS) plays a pivotal role in the landscape of on-premises real-time databases. These systems are designed to handle a wide array of data management tasks, including transaction processing, data retrieval, and storage management, all in real-time. The efficiency of an ODBMS is crucial for businesses that require immediate access to data to make timely decisions. In sectors like finance and healthcare, where data accuracy and speed are paramount, the implementation of a robust ODBMS ensures that organizations can maintain high performance and reliability. Furthermore, with the integration of advanced features such as in-memory processing and multi-model support, ODBMS solutions are becoming increasingly sophisticated, offering enhanced capabilities to meet the growing demands of modern enterprises.



    Regionally, North America holds the largest market share due to the early adoption of advanced technologies and the presence of major industry players. The region's strong emphasis on data security and regulatory compliance also supports the adoption of on-premises solutions. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period, driven by the rapid digital transformation initiatives, increasing IT investments, and the growing importance of real-time data analytics in emerging economies such as China and India.



    Component Analysis



    When analyzing the on-premises real-time database market by component, it is essential to consider the three main segments: software, hardware, and services. The software component, which includes database management systems and related applications, is the largest segment. This dominance is due to the critical role that software plays in managing, storing, and analyzing real-time data. Organizations are continually seeking advanced software solutions that offer enhanced performance, reliability, and scalability. Innovations in database software, such as in-memory processing and multi-model datab

  9. A

    ‘Climate Change Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Dec 13, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘Climate Change Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-climate-change-dataset-7e65/4a67af59/?iid=002-150&v=presentation
    Explore at:
    Dataset updated
    Dec 13, 2018
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Climate Change Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/climate-change-datae on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Data from World Development Indicators and Climate Change Knowledge Portal on climate systems, exposure to climate impacts, resilience, greenhouse gas emissions, and energy use.

    In addition to the data available here and through the Climate Data API, the Climate Change Knowledge Portal has a web interface to a collection of water indicators that may be used to assess the impact of climate change across over 8,000 water basins worldwide. You may use the web interface to download the data for any of these basins.

    Here is how to navigate to the water data:

    • Go to the Climate Change Knowledge Portal home page (http://climateknowledgeportal.worldbank.org/)
    • Click any region on the map Click a country In the navigation menu
    • Click "Impacts" and then "Water" Click the map to select a specific water basin
    • Click "Click here to get access to data and indicators" Please be sure to observe the disclaimers on the website regarding uncertainties and use of the water data.

    Attribution: Climate Change Data, World Bank Group.

    World Bank Data Catalog Terms of Use

    Source: http://data.worldbank.org/data-catalog/climate-change

    This dataset was created by World Bank and contains around 10000 samples along with 2009, 1993, technical information and other features such as: - 1994 - Series Code - and more.

    How to use this dataset

    • Analyze 1995 in relation to Scale
    • Study the influence of 1998 on Country Code
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit World Bank

    --- Original source retains full ownership of the source dataset ---

  10. A

    ‘What The World Thinks Of Trump?’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘What The World Thinks Of Trump?’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-what-the-world-thinks-of-trump-9a66/ef5d3ea5/?iid=007-925&v=presentation
    Explore at:
    Dataset updated
    Aug 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    Analysis of ‘What The World Thinks Of Trump?’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/trump-world-truste on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    See Readme for more details.
    This repository contains a selection of the data -- and the data-processing scripts -- behind the articles, graphics and interactives at FiveThirtyEight.

    We hope you'll use it to check our work and to create stories and visualizations of your own. The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know.

    Source: https://github.com/fivethirtyeight/data

    This dataset was created by FiveThirtyEight and contains around 0 samples along with Hungary, South Africa, technical information and other features such as: - Brazil - Kenya - and more.

    How to use this dataset

    • Analyze Russia in relation to Japan
    • Study the influence of Uk on Spain
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit FiveThirtyEight

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  11. A

    ‘50 Years Of World Cup Doppelgangers’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘50 Years Of World Cup Doppelgangers’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-50-years-of-world-cup-doppelgangers-c448/d5846ac8/?iid=003-442&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    Analysis of ‘50 Years Of World Cup Doppelgangers’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/world-cup-comparisonse on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    https://i.ibb.co/k5sWjcT/Selection-700.png" alt="">

    About this dataset

    This file contains links to the data behind 50 Years Of World Cup Doppelgangers.

    world_cup_comparisons.csv contains all historical players and their associated z-score for each of the 16 metrics.

    The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know.

    Source: https://github.com/fivethirtyeight/data

    This dataset was created by FiveThirtyEight and contains around 6000 samples along with Fouls Z, Crosses Z, technical information and other features such as: - Clearances Z - Blocks Z - and more.

    How to use this dataset

    • Analyze Boxtouches Z in relation to Fouled Z
    • Study the influence of Nsxg Z on Team
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit FiveThirtyEight

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  12. Statistically downscaled climate indices from CMIP6 global climate models...

    • ouvert.canada.ca
    • data.urbandatacentre.ca
    • +3more
    html, netcdf
    Updated Jan 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environment and Climate Change Canada (2025). Statistically downscaled climate indices from CMIP6 global climate models (CanDCS-U6 & CanDCS-M6) [Dataset]. https://ouvert.canada.ca/data/dataset/764720d5-8c0a-4e1e-93fc-d9e3eb0ab6b3
    Explore at:
    netcdf, htmlAvailable download formats
    Dataset updated
    Jan 28, 2025
    Dataset provided by
    Environment And Climate Change Canadahttps://www.canada.ca/en/environment-climate-change.html
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Jan 1, 1951 - Dec 31, 2100
    Description

    Environment and Climate Change Canada’s (ECCC) Climate Research Division (CRD) and the Pacific Climate Impacts Consortium (PCIC) previously produced statistically downscaled climate scenarios based on simulations from climate models that participated in the Coupled Model Intercomparison Project phase 5 (CMIP5) in 2015. ECCC and PCIC have now updated the CMIP5-based downscaled scenarios with two new sets of downscaled scenarios based on the next generation of climate projections from the Coupled Model Intercomparison Project phase 6 (CMIP6). The scenarios are named Canadian Downscaled Climate Scenarios–Univariate method from CMIP6 (CanDCS-U6) and Canadian Downscaled Climate Scenarios–Multivariate method from CMIP6 (CanDCS-M6). CMIP6 climate projections are based on both updated global climate models and new emissions scenarios called “Shared Socioeconomic Pathways” (SSPs). Statistically downscaled datasets have been produced from 26 CMIP6 global climate models (GCMs) under three different emission scenarios (i.e., SSP1-2.6, SSP2-4.5, and SSP5-8.5), with PCIC later adding SSP3-7.0 to the CanDCS-M6 dataset. The CanDCS-U6 was downscaled using the Bias Correction/Constructed Analogues with Quantile mapping version 2 (BCCAQv2) procedure, and the CanDCS-M6 was downscaled using the N-dimensional Multivariate Bias Correction (MBCn) method. The CanDCS-U6 dataset was produced using the same downscaling target data (NRCANmet) as the CMIP5-based downscaled scenarios, while the CanDCS-M6 dataset implements a new target dataset (ANUSPLIN and PNWNAmet blended dataset). Statistically downscaled individual model output and ensembles are available for download. Downscaled climate indices are available across Canada at 10km grid spatial resolution for the 1950-2014 historical period and for the 2015-2100 period following each of the three emission scenarios. A total of 31 climate indices have been calculated using the CanDCS-U6 and CanDCS-M6 datasets. The climate indices include 27 Climdex indices established by the Expert Team on Climate Change Detection and Indices (ETCCDI) and 4 additional indices that are slightly modified from the Climdex indices. These indices are calculated from daily precipitation and temperature values from the downscaled simulations and are available at annual or monthly temporal resolution, depending on the index. Monthly indices are also available in seasonal and annual versions. Note: projected future changes by statistically downscaled products are not necessarily more credible than those by the underlying climate model outputs. In many cases, especially for absolute threshold-based indices, projections based on downscaled data have a smaller spread because of the removal of model biases. However, this is not the case for all indices. Downscaling from GCM resolution to the fine resolution needed for impacts assessment increases the level of spatial detail and temporal variability to better match observations. Since these adjustments are GCM dependent, the resulting indices could have a wider spread when computed from downscaled data as compared to those directly computed from GCM output. In the latter case, it is not the downscaling procedure that makes future projection more uncertain; rather, it is indicative of higher variability associated with finer spatial scale. Individual model datasets and all related derived products are subject to the terms of use (https://pcmdi.llnl.gov/CMIP6/TermsOfUse/TermsOfUse6-1.html) of the source organization.

  13. A

    ‘How Every NFL Team’s Fans Lean Politically?’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘How Every NFL Team’s Fans Lean Politically?’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-how-every-nfl-teams-fans-lean-politically-550a/f911ccf2/?iid=003-014&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘How Every NFL Team’s Fans Lean Politically?’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/nfl-fandome on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Data behind the story How Every NFL Team’s Fans Lean Politically.

    Google Trends Data

    Google Trends data was derived from comparing 5-year search traffic for the 7 sports leagues we analyzed:

    https://g.co/trends/5P8aa

    Results are listed by designated market area (DMA).

    The percentages are the approximate percentage of major-sports searches that were conducted for each league.

    Trump's percentage is his share of the vote within the DMA in the 2016 presidential election.

    SurveyMonkey Data

    SurveyMonkey data was derived from a poll of American adults ages 18 and older, conducted between Sept. 1-7, 2017.

    Listed numbers are the raw totals for respondents who ranked a given NFL team among their three favorites, and how many identified with a given party (further broken down by race). We also list the percentages of the entire sample that identified with each party, and were of each race.

    The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know.

    Source: https://github.com/fivethirtyeight/data

    This dataset was created by FiveThirtyEight and contains around 0 samples along with Unnamed: 10, Unnamed: 4, technical information and other features such as: - Unnamed: 3 - Unnamed: 1 - and more.

    How to use this dataset

    • Analyze Unnamed: 13 in relation to Unnamed: 21
    • Study the influence of Unnamed: 7 on Unnamed: 12
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit FiveThirtyEight

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  14. A

    ‘💣 Global Terrorism Database (GTD)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘💣 Global Terrorism Database (GTD)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-global-terrorism-database-gtd-b0b8/e6145d55/?iid=005-149&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘💣 Global Terrorism Database (GTD)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/terrorisme on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    See Readme for more details.
    This repository contains a selection of the data -- and the data-processing scripts -- behind the articles, graphics and interactives at FiveThirtyEight.

    We hope you'll use it to check our work and to create stories and visualizations of your own. The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know](andrei.scheinkman@fivethirtyeight.com).

    Source: https://github.com/fivethirtyeight/data

    This dataset was created by FiveThirtyEight and contains around 0 samples along with Ireland, Denmark, technical information and other features such as: - Greece - Luxembourg - and more.

    How to use this dataset

    • Analyze Germany in relation to Italy
    • Study the influence of France on United Kingdom
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit FiveThirtyEight

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  15. Sustainable Development Goals

    • kaggle.com
    zip
    Updated Jan 12, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). Sustainable Development Goals [Dataset]. https://www.kaggle.com/theworldbank/sustainable-development-goals
    Explore at:
    zip(20674194 bytes)Available download formats
    Dataset updated
    Jan 12, 2019
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    Relevant indicators drawn from the World Development Indicators, reorganized according to the goals and targets of the Sustainable Development Goals (SDGs). These indicators may help to monitor SDGs, but they are not always the official indicators for SDG monitoring.

    Context

    This is a dataset hosted by the World Bank. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore the World Bank using Kaggle and all of the data sources available through the World Bank organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using the World Bank's APIs and Kaggle's API.

    Cover photo by NA on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  16. t

    Privacy-Sensitive Conversations between Care Workers and Care Home Residents...

    • researchdata.tuwien.ac.at
    • test.researchdata.tuwien.ac.at
    bin, text/markdown
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Reinhard Grabler; Reinhard Grabler; Michael Starzinger; Michael Starzinger; Matthias Hirschmanner; Matthias Hirschmanner; Helena Anna Frijns; Helena Anna Frijns (2025). Privacy-Sensitive Conversations between Care Workers and Care Home Residents in a Residential Care Home [Dataset]. http://doi.org/10.48436/q1kt0-edc53
    Explore at:
    bin, text/markdownAvailable download formats
    Dataset updated
    Feb 25, 2025
    Dataset provided by
    TU Wien
    Authors
    Reinhard Grabler; Reinhard Grabler; Michael Starzinger; Michael Starzinger; Matthias Hirschmanner; Matthias Hirschmanner; Helena Anna Frijns; Helena Anna Frijns
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 2024 - Aug 2024
    Description

    Dataset Card for "privacy-care-interactions"

    Table of Contents

    Dataset Description

    Purpose and Features

    🔒 Collection of Privacy-Sensitive Conversations between Care Workers and Care Home Residents in an Residential Care Home 🔒

    The dataset is useful to train and evaluate models to identify and classify privacy-sensitive parts of conversations from text, especially in the context of AI assistants and LLMs.

    Dataset Overview

    Language Distribution 🌍

    • English (en): 95

    Locale Distribution 🌎

    • United States (US) 🇺🇸: 95

    Key Facts 🔑

    • This is synthetic data! Generated using proprietary algorithms - no privacy violations!
    • Conversations are classified following the taxonomy for privacy-sensitive robotics by Rueben et al. (2017).
    • The data was manually labeled by an expert.

    Dataset Structure

    Data Instances

    The provided data format is .jsonl, the JSON Lines text format, also called newline-delimited JSON. An example entry looks as follows.

    { "text": "CW: Have you ever been to Italy? CR: Oh, yes... many years ago.", "taxonomy": 0, "category": 0, "affected_speaker": 1, "language": "en", "locale": "US", "data_type": 1, "uid": 16, "split": "train" }

    Data Fields

    The data fields are:

    • text: a string feature. The abbreviaton of the speakers refer to the care worker (CW) and the care recipient (CR).
    • taxonomy: a classification label, with possible values including informational (0), invasion (1), collection (2), processing (3), dissemination (4), physical (5), personal-space (6), territoriality (7), intrusion (8), obtrusion (9), contamination (10), modesty (11), psychological (12), interrogation (13), psychological-distance (14), social (15), association (16), crowding-isolation (17), public-gaze (18), solitude (19), intimacy (20), anonymity (21), reserve (22). The taxonomy is derived from Rueben et al. (2017). The classifications were manually labeled by an expert.
    • category: a classification label, with possible values including personal-information (0), family (1), health (2), thoughts (3), values (4), acquaintance (5), appointment (6). The privacy category affected in the conversation. The classifications were manually labeled by an expert.
    • affected_speaker: a classification label, with possible values including care-worker (0), care-recipient (1), other (2), both (3). The speaker whose privacy is impacted during the conversation. The classifications were manually labeled by an expert.
    • language: a string feature. Language code as defined by ISO 639.
    • locale: a string feature. Regional code as defined by ISO 3166-1 alpha-2.
    • data_type: a string a classification label, with possible values including real (0), synthetic (1).
    • uid: a int64 feature. A unique identifier within the dataset.
    • split: a string feature. Either train, validation or test.

    Dataset Splits

    The dataset has 2 subsets:

    • split: with a total of 95 examples split into train, validation and test (70%-15%-15%)
    • unsplit: with a total of 95 examples in a single train split
    nametrainvalidationtest
    split661415
    unsplit95n/an/a

    The files follow the naming convention subset-split-language.jsonl. The following files are contained in the dataset:

    • split-train-en.jsonl
    • split-validation-en.jsonl
    • split-test-en.jsonl
    • unsplit-train-en.jsonl

    Dataset Creation

    Curation Rationale

    Recording audio of care workers and residents during care interactions, which includes partial and full body washing, giving of medication, as well as wound care, is a highly privacy-sensitive use case. Therefore, a dataset is created, which includes privacy-sensitive parts of conversations, synthesized from real-world data. This dataset serves as a basis for fine-tuning a local LLM to highlight and classify privacy-sensitive sections of transcripts created in care interactions, to further mask them to protect privacy.

    Source Data

    Initial Data Collection

    The intial data was collected in the project Caring Robots of TU Wien in cooperation with Caritas Wien. One project track aims to facilitate Large Languge Models (LLM) to support documentation of care workers, with LLM-generated summaries of audio recordings of interactions between care workers and care home residents. The initial data are the transcriptions of those care interactions.

    Data Processing

    The transcriptions were thoroughly reviewed, and sections containing privacy-sensitive information were identified and marked using qualitative data analysis software by two experts. Subsequently, the sections were translated from German to U.S. English using the locally executed LLM icky/translate. In the next step, another llama3.1:70b was used locally to synthesize the conversation segments. This process involved generating similar, yet distinct and new, conversations that are not linked to the original data. The dataset was split using the train_test_split function from the <a href="https://scikit-learn.org/1.5/modules/generated/sklearn.model_selection.train_test_split.html" target="_blank" rel="noopener

  17. MODIS Thermal (Last 7 days)

    • wifire-data.sdsc.edu
    Updated Mar 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2023). MODIS Thermal (Last 7 days) [Dataset]. https://wifire-data.sdsc.edu/dataset/modis-thermal-last-7-days
    Explore at:
    html, zip, csv, kml, geojson, arcgis geoservices rest apiAvailable download formats
    Dataset updated
    Mar 3, 2023
    Dataset provided by
    Esrihttp://esri.com/
    Description

    This layer presents detectable thermal activity from MODIS satellites for the last 7 days. MODIS Global Fires is a product of NASA’s Earth Observing System Data and Information System (EOSDIS), part of NASA's Earth Science Data. EOSDIS integrates remote sensing and GIS technologies to deliver global MODIS hotspot/fire locations to natural resource managers and other stakeholders around the World.


    Consumption Best Practices:

    • As a service that is subject to Viral loads (very high usage), avoid adding Filters that use a Date/Time type field. These queries are not cacheable and WILL be subject to 'https://en.wikipedia.org/wiki/Rate_limiting' rel='nofollow ugc'>Rate Limiting by ArcGIS Online. To accommodate filtering events by Date/Time, we encourage using the included "Age" fields that maintain the number of Days or Hours since a record was created or last modified compared to the last service update. These queries fully support the ability to cache a response, allowing common query results to be supplied to many users without adding load on the service.
    • When ingesting this service in your applications, avoid using POST requests, these requests are not cacheable and will also be subject to Rate Limiting measures.

    Scale/Resolution: 1km

    Update Frequency: 1/2 Hour (every 30 minutes) using the Aggregated Live Feed Methodology

    Area Covered: World

    What can I do with this layer?
    The MODIS thermal activity layer can be used to visualize and assess wildfires worldwide. However, it should be noted that this dataset contains many “false positives” (e.g., oil/natural gas wells or volcanoes) since the satellite will detect any large thermal signal.

    Additional Information
    MODIS stands for MODerate resolution Imaging Spectroradiometer. The MODIS instrument is on board NASA’s Earth Observing System (EOS) Terra (EOS AM) and Aqua (EOS PM) satellites. The orbit of the Terra satellite goes from north to south across the equator in the morning and Aqua passes south to north over the equator in the afternoon resulting in global coverage every 1 to 2 days. The EOS satellites have a ±55 degree scanning pattern and orbit at 705 km with a 2,330 km swath width.

    It takes approximately 2 – 4 hours after satellite overpass for MODIS Rapid Response to process the data, and for the Fire Information for Resource Management System (FIRMS) to update the website. Occasionally, hardware errors can result in processing delays beyond the 2-4 hour range. Additional information on the MODIS system status can be found at MODIS Rapid Response.

    Attribute Information
    • Latitude and Longitude: The center point location of the 1km (approx.) pixel flagged as containing one or more fires/hotspots (fire size is not 1km, but variable). Stored by Point Geometry. See What does a hotspot/fire detection mean on the ground?
    • Brightness: The brightness temperature measured (in Kelvin) using the MODIS channels 21/22 and channel 31.
    • Scan and Track: The actual spatial resolution of the scanned pixel. Although the algorithm works at 1km resolution, the MODIS pixels get bigger toward the edge of the scan. See What does scan and track mean?
    • Date and Time: Acquisition date of the hotspot/active fire pixel and time of satellite overpass in UTC (client presentation in local time). Stored by Acquisition Date.
    • Acquisition Date: Derived Date/Time field combining Date and Time attributes.
    • Satellite: Whether the detection was picked up by the Terra or Aqua satellite.
    • Confidence: The detection confidence is a quality flag of the individual hotspot/active fire pixel.
    • Version: Version refers to the processing collection and source of data. The number before the decimal refers to the collection (e.g. MODIS Collection 6). The number after the decimal indicates the source of Level 1B data; data processed in near-real time by MODIS Rapid Response will have the source code “CollectionNumber.0”. Data sourced from MODAPS (with a 2-month lag) and processed by FIRMS using the standard MOD14/MYD14 Thermal Anomalies algorithm will have a source code “CollectionNumber.x”. For example, data with the version listed as 5.0 is collection 5, processed by MRR, data with the version listed as 5.1 is collection 5 data processed by FIRMS using Level 1B data from MODAPS.
    • Bright.T31: Channel 31 brightness temperature (in Kelvins) of the hotspot/active fire pixel.
    • FRP: Fire Radiative Power. Depicts the pixel-integrated fire radiative power in MW (MegaWatts). FRP provides information on the measured radiant heat output of detected fires. The amount of radiant heat energy liberated per unit time (the Fire Radiative Power) is thought to be related to the rate at which fuel is being consumed (Wooster et. al. (2005)).
    • DayNight: The standard processing algorithm uses the solar zenith angle (SZA) to threshold the day/night value; if the SZA exceeds 85 degrees it is assigned a night value. SZA values less than 85 degrees are assigned a day time value. For the NRT algorithm the day/night flag is assigned by ascending (day) vs descending (night) observation. It is expected that the NRT assignment of the day/night flag will be amended to be consistent with the standard processing.
    • Hours Old: Derived field that provides age of record in hours between Acquisition date/time and latest update date/time. 0 = less than 1 hour ago, 1 = less than 2 hours ago, 2 = less than 3 hours ago, and so on.
    Revisions
    • June 22, 2022: Added 'HOURS_OLD' field to enhance Filtering data. Added 'Last 7 days' Layer to extend data to match time range of VIIRS offering. Added Field level descriptions.
    This map is provided for informational purposes and is not monitored 24/7 for accuracy and

  18. A

    ‘Predicting Women's NBA (WNBA)’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Predicting Women's NBA (WNBA)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-predicting-women-s-nba-wnba-dbae/latest
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Predicting Women's NBA (WNBA)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/wnba-forecastse on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    https://i.ibb.co/4dcHDh5/WNBA.png" alt="">

    About this dataset

    About

    This file contains links to the data behind our WNBA Predictions. More information on how our WNBA Elo model works can be found in this article.

    wnba_elo.csv contains game-by-game Elo ratings and forecasts since 1997.

    wnba_elo_latest.csv contains game-by-game Elo ratings and forecasts for only the latest season.

    License

    Data released under the Creative Commons Attribution 4.0 License

    Source

    GitHub

    This dataset was created by data.world's Admin and contains around 6000 samples along with Home Team Postgame Rating, Home Team, technical information and other features such as: - Date - Away Team - and more.

    How to use this dataset

    • Analyze Neutral in relation to Home Team Pregame Rating
    • Study the influence of Away Team Postgame Rating on Season
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit data.world's Admin

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  19. Z

    CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Anselmo Stahl (2024). CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company Similarity Quantification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7957401
    Explore at:
    Dataset updated
    Jun 4, 2024
    Dataset provided by
    Mark Granroth-Wilding
    Vilhelm von Ehrenheim
    Dhiana Deva Cavacanti Rocha
    Richard Anselmo Stahl
    Armin Catovic
    Lele Cao
    Drew McCornack
    Description

    CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.

    Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.

    Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.

    Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:

    Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.

    Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.

    Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.

    Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).

    Background and Motivation

    In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.

    While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.

    In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.

    However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.

    Source Code and Tutorial:https://github.com/llcresearch/CompanyKG2

    Paper: to be published

  20. a

    WDPA - World Database on Protected Areas polygons from WCMC

    • hub.arcgis.com
    • globil-panda.opendata.arcgis.com
    Updated Dec 30, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Wide Fund for Nature (2016). WDPA - World Database on Protected Areas polygons from WCMC [Dataset]. https://hub.arcgis.com/maps/61cde74cf99645b7b2c30212514ddae5
    Explore at:
    Dataset updated
    Dec 30, 2016
    Dataset authored and provided by
    World Wide Fund for Nature
    Area covered
    Description

    The World Database on Protected Areas (WDPA) is the most comprehensive global database of marine and terrestrial protected areas and is one of the key global biodiversity datasets being widely used by scientists, businesses, governments, International secretariats and others to inform planning, policy decisions and management.The WDPA is a joint project between the United Nations Environment Programme (UNEP) and the International Union for Conservation of Nature (IUCN). The compilation and management of the WDPA is carried out by UNEP World Conservation Monitoring Centre (UNEP-WCMC), in collaboration with governments, non-governmental organisations, academia and industry. There are monthly updates of the data which are made available online through the Protected Planet website where the data is both viewable and downloadable.Data and information on the world's protected areas compiled in the WDPA are used for reporting to the Convention on Biological Diversity on progress towards reaching the Aichi Biodiversity Targets (particularly Target 11), to the UN to track progress towards the 2030 Sustainable Development Goals, to some of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) core indicators, and other international assessments and reports including the Global Biodiversity Outlook, as well as for the publication of the United Nations List of Protected Areas. Every two years, UNEP-WCMC releases the Protected Planet Report on the status of the world's protected areas and recommendations on how to meet international goals and targets.Many platforms are incorporating the WDPA to provide integrated information to diverse users, including businesses and governments, in a range of sectors including mining, oil and gas, and finance. For example, the WDPA is included in the Integrated Biodiversity Assessment Tool, an innovative decision support tool that gives users easy access to up-to-date information that allows them to identify biodiversity risks and opportunities within a project boundary.The reach of the WDPA is further enhanced in services developed by other parties, such as theGlobal Forest Watch and the Digital Observatory for Protected Areas, which provide decision makers with access to monitoring and alert systems that allow whole landscapes to be managed better. Together, these applications of the WDPA demonstrate the growing value and significance of the Protected Planet initiative.For more details on the WDPA please read through the WDPA User Manual.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Secretariat of the Pacific Regional Environment Programme (2025). World Database on Protected Areas [Dataset]. https://kiribati-data.sprep.org/dataset/world-database-protected-areas-0
Organization logo

Data from: World Database on Protected Areas

Related Article
Explore at:
jpg(577876), pdf(2100272)Available download formats
Dataset updated
Feb 20, 2025
Dataset provided by
Pacific Regional Environment Programmehttps://www.sprep.org/
License

Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically

Area covered
POLYGON ((205.20703554153 -28.505385171432, 129.26953554153 1.4588219018416, 204.92578983307 6.2279312638895)), 167.23828554153 25.085596467854, 141.3632941246 -0.22851555560937, 129.26953554153 27.605668449605, 160.20703554153 -29.489341672009, 141.92578554153 -11.126668087769, Pacific Region
Description

The World Database on Protected Areas (WDPA) is the most comprehensive global database of marine and terrestrial protected areas, updated on a monthly basis, and is one of the key global biodiversity data sets being widely used by scientists, businesses, governments, International secretariats and others to inform planning, policy decisions and management. The WDPA is a joint project between UN Environment and the International Union for Conservation of Nature (IUCN). The compilation and management of the WDPA is carried out by UN Environment World Conservation Monitoring Centre (UNEP-WCMC), in collaboration with governments, non-governmental organisations, academia and industry. There are monthly updates of the data which are made available online through the Protected Planet website where the data is both viewable and downloadable. Data and information on the world's protected areas compiled in the WDPA are used for reporting to the Convention on Biological Diversity on progress towards reaching the Aichi Biodiversity Targets (particularly Target 11), to the UN to track progress towards the 2030 Sustainable Development Goals, to some of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) core indicators, and other international assessments and reports including the Global Biodiversity Outlook, as well as for the publication of the United Nations List of Protected Areas. Every two years, UNEP-WCMC releases the Protected Planet Report on the status of the world's protected areas and recommendations on how to meet international goals and targets. Many platforms are incorporating the WDPA to provide integrated information to diverse users, including businesses and governments, in a range of sectors including mining, oil and gas, and finance. For example, the WDPA is included in the Integrated Biodiversity Assessment Tool, an innovative decision support tool that gives users easy access to up-to-date information that allows them to identify biodiversity risks and opportunities within a project boundary. The reach of the WDPA is further enhanced in services developed by other parties, such as the Global Forest Watch and the Digital Observatory for Protected Areas, which provide decision makers with access to monitoring and alert systems that allow whole landscapes to be managed better. Together, these applications of the WDPA demonstrate the growing value and significance of the Protected Planet initiative.

Search
Clear search
Close search
Google apps
Main menu