57 datasets found

Data from: World Database on Protected Areas
kiribati-data.sprep.org
pacificdata.org
+13more
jpg, pdf
Updated Feb 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Secretariat of the Pacific Regional Environment Programme (2025). World Database on Protected Areas [Dataset]. https://kiribati-data.sprep.org/dataset/world-database-protected-areas-0
Explore at:
jpg(577876), pdf(2100272)Available download formats
Dataset updated
Feb 20, 2025
Dataset provided by
Pacific Regional Environment Programmehttps://www.sprep.org/
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Area covered
POLYGON ((205.20703554153 -28.505385171432, 129.26953554153 1.4588219018416, 204.92578983307 6.2279312638895)), 167.23828554153 25.085596467854, 141.3632941246 -0.22851555560937, 129.26953554153 27.605668449605, 160.20703554153 -29.489341672009, 141.92578554153 -11.126668087769, Pacific Region
Description
The World Database on Protected Areas (WDPA) is the most comprehensive global database of marine and terrestrial protected areas, updated on a monthly basis, and is one of the key global biodiversity data sets being widely used by scientists, businesses, governments, International secretariats and others to inform planning, policy decisions and management. The WDPA is a joint project between UN Environment and the International Union for Conservation of Nature (IUCN). The compilation and management of the WDPA is carried out by UN Environment World Conservation Monitoring Centre (UNEP-WCMC), in collaboration with governments, non-governmental organisations, academia and industry. There are monthly updates of the data which are made available online through the Protected Planet website where the data is both viewable and downloadable. Data and information on the world's protected areas compiled in the WDPA are used for reporting to the Convention on Biological Diversity on progress towards reaching the Aichi Biodiversity Targets (particularly Target 11), to the UN to track progress towards the 2030 Sustainable Development Goals, to some of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) core indicators, and other international assessments and reports including the Global Biodiversity Outlook, as well as for the publication of the United Nations List of Protected Areas. Every two years, UNEP-WCMC releases the Protected Planet Report on the status of the world's protected areas and recommendations on how to meet international goals and targets. Many platforms are incorporating the WDPA to provide integrated information to diverse users, including businesses and governments, in a range of sectors including mining, oil and gas, and finance. For example, the WDPA is included in the Integrated Biodiversity Assessment Tool, an innovative decision support tool that gives users easy access to up-to-date information that allows them to identify biodiversity risks and opportunities within a project boundary. The reach of the WDPA is further enhanced in services developed by other parties, such as the Global Forest Watch and the Digital Observatory for Protected Areas, which provide decision makers with access to monitoring and alert systems that allow whole landscapes to be managed better. Together, these applications of the WDPA demonstrate the growing value and significance of the Protected Planet initiative.
Global Green Economy Index (GGEI)
kaggle.com
Updated May 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy Tamanini (2024). Global Green Economy Index (GGEI) [Dataset]. https://www.kaggle.com/datasets/jeremytamanini/global-green-economy-index-ggei
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 8, 2024
Dataset provided by
Kaggle
Authors
Jeremy Tamanini
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
For the first time, the full results from the Global Green Economy Index (GGEI) are available in the public domain. Historically, only the aggregate results have been publicly accessible. The full dataset has been paywalled and accessible to our subscribers only. But the way in which we release GGEI data to the public is changing. Read on for a quick explanation for how and why.

First, the how. The GGEI file publicly accessible today represents that dataset officially compiled in 2022. It contains the full results for each of the 18 indicators in the GGEI for 160 countries, across the four main dimensions of climate change & social equity, sector decarbonization, markets & ESG investment and the environment. Some (not all) of these data points have since been updated, as new datasets have been published. The GGEI is a dynamic model, updating in real-time as new data becomes available. Our subscribing clients will still receive this most timely version of the model, along with any customizations they may request.

Now, the why. First and foremost, there is huge demand among academic researchers globally for the full GGEI dataset. Academic inquiry around the green transition, sustainable development, ESG investing, and green energy systems has exploded over the past several years. We receive hundreds of inquiries annually from these students and researchers to access the full GGEI dataset. Making it publicly accessible as we are today makes it easier for these individuals and institutions to use these GGEI to promote learning and green progress within their institutions.

More broadly, the landscape for data has changed significantly. A decade ago when the GGEI was first published, datasets existed more in silos and users might subscribe to one specific dataset like the GGEI to answer a specific question. But today, data usage in the sustainability space has become much more of a system, whereby myriad data sources are synthesized into increasingly sophisticated models, often fueled by artificial intelligence. Making the GGEI more accessible will accelerate how this perspective on the global green economy can be integrated to these systems.
d
Protected Areas Database of the United States (PAD-US) 2.1 - World Database...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 2.1 - World Database on Protected Areas (WDPA) Submission (ver. 1.1, April 2021) [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-2-1-world-database-on-protected-areas
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
The United States Geological Survey (USGS) - Science Analytics and Synthesis (SAS) - Gap Analysis Project (GAP) manages the Protected Areas Database of the United States (PAD-US), an Arc10x geodatabase, that includes a full inventory of areas dedicated to the preservation of biological diversity and to other natural, recreation, historic, and cultural uses, managed for these purposes through legal or other effective means (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/protected-areas). The PAD-US is developed in partnership with many organizations, including coordination groups at the [U.S.] Federal level, lead organizations for each State, and a number of national and other non-governmental organizations whose work is closely related to the PAD-US. Learn more about the USGS PAD-US partners program here: www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-data-stewards. The United Nations Environmental Program - World Conservation Monitoring Centre (UNEP-WCMC) tracks global progress toward biodiversity protection targets enacted by the Convention on Biological Diversity (CBD) through the World Database on Protected Areas (WDPA) and World Database on Other Effective Area-based Conservation Measures (WD-OECM) available at: www.protectedplanet.net. See the Aichi Target 11 dashboard (www.protectedplanet.net/en/thematic-areas/global-partnership-on-aichi-target-11) for official protection statistics recognized globally and developed for the CBD, or here for more information and statistics on the United States of America's protected areas: www.protectedplanet.net/country/USA. It is important to note statistics published by the National Oceanic and Atmospheric Administration (NOAA) Marine Protected Areas (MPA) Center (www.marineprotectedareas.noaa.gov/dataanalysis/mpainventory/) and the USGS-GAP (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-statistics-and-reports) differ from statistics published by the UNEP-WCMC as methods to remove overlapping designations differ slightly and U.S. Territories are reported separately by the UNEP-WCMC (e.g. The largest MPA, "Pacific Remote Islands Marine Monument" is attributed to the United States Minor Outlying Islands statistics). At the time of PAD-US 2.1 publication (USGS-GAP, 2020), NOAA reported 26% of U.S. marine waters (including the Great Lakes) as protected in an MPA that meets the International Union for Conservation of Nature (IUCN) definition of biodiversity protection (www.iucn.org/theme/protected-areas/about). USGS-GAP plans to publish PAD-US 2.1 Statistics and Reports in the spring of 2021. The relationship between the USGS, the NOAA, and the UNEP-WCMC is as follows: - USGS manages and publishes the full inventory of U.S. marine and terrestrial protected areas data in the PAD-US representing many values, developed in collaboration with a partnership network in the U.S. and; - USGS is the primary source of U.S. marine and terrestrial protected areas data for the WDPA, developed from a subset of the PAD-US in collaboration with the NOAA, other agencies and non-governmental organizations in the U.S., and the UNEP-WCMC and; - UNEP-WCMC is the authoritative source of global protected area statistics from the WDPA and WD-OECM and; - NOAA is the authoritative source of MPA data in the PAD-US and MPA statistics in the U.S. and; - USGS is the authoritative source of PAD-US statistics (including areas primarily managed for biodiversity, multiple uses including natural resource extraction, and public access). The PAD-US 2.1 Combined Marine, Fee, Designation, Easement feature class (GAP Status Code 1 and 2 only) is the source of protected areas data in this WDPA update. Tribal areas and military lands represented in the PAD-US Proclamation feature class as GAP Status Code 4 (no known mandate for biodiversity protection) are not included as spatial data to represent internal protected areas are not available at this time. The USGS submitted more than 42,900 protected areas from PAD-US 2.1, including all 50 U.S. States and 6 U.S. Territories, to the UNEP-WCMC for inclusion in the May 2021 WDPA, available at www.protectedplanet.net. The NOAA is the sole source of MPAs in PAD-US and the National Conservation Easement Database (NCED, www.conservationeasement.us/) is the source of conservation easements. The USGS aggregates authoritative federal lands data directly from managing agencies for PAD-US (www.communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/), while a network of State data-stewards provide state, local government lands, and some land trust preserves. National nongovernmental organizations contribute spatial data directly (www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-data-stewards). The USGS translates the biodiversity focused subset of PAD-US into the WDPA schema (UNEP-WCMC, 2019) for efficient aggregation by the UNEP-WCMC. The USGS maintains WDPA Site Identifiers (WDPAID, WDPA_PID), a persistent identifier for each protected area, provided by UNEP-WCMC. Agency partners are encouraged to track WDPA Site Identifier values in source datasets to improve the efficiency and accuracy of PAD-US and WDPA updates. The IUCN protected areas in the U.S. are managed by thousands of agencies and organizations across the country and include over 42,900 designated sites such as National Parks, National Wildlife Refuges, National Monuments, Wilderness Areas, some State Parks, State Wildlife Management Areas, Local Nature Preserves, City Natural Areas, The Nature Conservancy and other Land Trust Preserves, and Conservation Easements. The boundaries of these protected places (some overlap) are represented as polygons in the PAD-US, along with informative descriptions such as Unit Name, Manager Name, and Designation Type. As the WDPA is a global dataset, their data standards (UNEP-WCMC 2019) require simplification to reduce the number of records included, focusing on the protected area site name and management authority as described in the Supplemental Information section in this metadata record. Given the numerous organizations involved, sites may be added or removed from the WDPA between PAD-US updates. These differences may reflect actual change in protected area status; however, they also reflect the dynamic nature of spatial data or Geographic Information Systems (GIS). Many agencies and non-governmental organizations are working to improve the accuracy of protected area boundaries, the consistency of attributes, and inventory completeness between PAD-US updates. In addition, USGS continually seeks partners to review and refine the assignment of conservation measures in the PAD-US.
D
Data for "Prediction of Search Targets From Fixations in Open-World...
darus.uni-stuttgart.de
Updated Oct 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas Bulling (2022). Data for "Prediction of Search Targets From Fixations in Open-World Settings" [Dataset]. http://doi.org/10.18419/DARUS-3226
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-3226
Dataset updated
Oct 28, 2022
Dataset provided by
DaRUS
Authors
Andreas Bulling
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
World
Dataset funded by
DFG
Cluster of Excellence on Multimodal Computing and Interaction (MMCI) at Saarland University
Description
We designed a human study to collect fixation data during visual search. We opted for a task that involved searching for a single image (the target) within a synthesised collage of images (the search set). Each of the collages are the random permutation of a finite set of images. To explore the impact of the similarity in appearance between target and search set on both fixation behaviour and automatic inference, we have created three different search tasks covering a range of similarities. In prior work, colour was found to be a particularly important cue for guiding search to targets and target-similar objects. Therefore we have selected for the first task 78 coloured O'Reilly book covers to compose the collages. These covers show a woodcut of an animal at the top and the title of the book in a characteristic font underneath. Given that overall cover appearance was very similar, this task allows us to analyse fixation behaviour when colour is the most discriminative feature. For the second task we use a set of 84 book covers from Amazon. In contrast to the first task, appearance of these covers is more diverse. This makes it possible to analyse fixation behaviour when both structure and colour information could be used by participants to find the target. Finally, for the third task, we use a set of 78 mugshots from a public database of suspects. In contrast to the other tasks, we transformed the mugshots to grey-scale so that they did not contain any colour information. In this case, allows abalysis of fixation behaviour when colour information was not available at all. We found faces to be particularly interesting given the relevance of searching for faces in many practical applications. 18 participants (9 males), age 18-30 Gaze data recorded with a stationary Tobii TX300 eye tracker More information about the dataset can be found in the README file.
w
Learning Poverty Global Database
data360.worldbank.org
Updated Apr 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Learning Poverty Global Database [Dataset]. https://data360.worldbank.org/en/dataset/WB_LPGD
Explore at:
Dataset updated
Apr 18, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2001 - 2023
Description
Will all children be able to read by 2030? The ability to read with comprehension is a foundational skill that every education system around the world strives to impart by late in primary school—generally by age 10. Moreover, attaining the ambitious Sustainable Development Goals (SDGs) in education requires first achieving this basic building block, and so does improving countries’ Human Capital Index scores. Yet past evidence from many low- and middle-income countries has shown that many children are not learning to read with comprehension in primary school. To understand the global picture better, we have worked with the UNESCO Institute for Statistics (UIS) to assemble a new dataset with the most comprehensive measures of this foundational skill yet developed, by linking together data from credible cross-national and national assessments of reading. This dataset covers 115 countries, accounting for 81% of children worldwide and 79% of children in low- and middle-income countries. The new data allow us to estimate the reading proficiency of late-primary-age children, and we also provide what are among the first estimates (and the most comprehensive, for low- and middle-income countries) of the historical rate of progress in improving reading proficiency globally (for the 2000-17 period). The results show that 53% of all children in low- and middle-income countries cannot read age-appropriate material by age 10, and that at current rates of improvement, this “learning poverty” rate will have fallen only to 43% by 2030. Indeed, we find that the goal of all children reading by 2030 will be attainable only with historically unprecedented progress. The high rate of “learning poverty” and slow progress in low- and middle-income countries is an early warning that all the ambitious SDG targets in education (and likely of social progress) are at risk. Based on this evidence, we suggest a new medium-term target to guide the World Bank’s work in low- and middle- income countries: cut learning poverty by at least half by 2030. This target, together with improved measurement of learning, can be as an evidence-based tool to accelerate progress to get all children reading by age 10.

For further details, please refer to https://thedocs.worldbank.org/en/doc/e52f55322528903b27f1b7e61238e416-0200022022/original/Learning-poverty-report-2022-06-21-final-V7-0-conferenceEdition.pdf
f
Datasets used for model learning and validation.
figshare.com
xls
Updated Mar 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jialiang Sun; Jun Guo; Jian Liu (2024). Datasets used for model learning and validation. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011972.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1011972.t001
Dataset updated
Mar 26, 2024
Dataset provided by
PLOS Computational Biology
Authors
Jialiang Sun; Jun Guo; Jian Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Using the CRISPR-Cas9 system to perform base substitutions at the target site is a typical technique for genome editing with the potential for applications in gene therapy and agricultural productivity. When the CRISPR-Cas9 system uses guide RNA to direct the Cas9 endonuclease to the target site, it may misdirect it to a potential off-target site, resulting in an unintended genome editing. Although several computational methods have been proposed to predict off-target effects, there is still room for improvement in the off-target effect prediction capability. In this paper, we present an effective approach called CRISPR-M with a new encoding scheme and a novel multi-view deep learning model to predict the sgRNA off-target effects for target sites containing indels and mismatches. CRISPR-M takes advantage of convolutional neural networks and bidirectional long short-term memory recurrent neural networks to construct a three-branch network towards multi-views. Compared with existing methods, CRISPR-M demonstrates significant performance advantages running on real-world datasets. Furthermore, experimental analysis of CRISPR-M under multiple metrics reveals its capability to extract features and validates its superiority on sgRNA off-target effect predictions.
Data from: FISBe: A real-world benchmark dataset for instance segmentation...
zenodo.org
data.niaid.nih.gov
bin, json +3
Updated Apr 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lisa Mais; Lisa Mais; Peter Hirsch; Peter Hirsch; Claire Managan; Claire Managan; Ramya Kandarpa; Josef Lorenz Rumberger; Josef Lorenz Rumberger; Annika Reinke; Annika Reinke; Lena Maier-Hein; Lena Maier-Hein; Gudrun Ihrke; Gudrun Ihrke; Dagmar Kainmueller; Dagmar Kainmueller; Ramya Kandarpa (2024). FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures [Dataset]. http://doi.org/10.5281/zenodo.10875063
Explore at:
zip, text/x-python, bin, json, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10875063
Dataset updated
Apr 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lisa Mais; Lisa Mais; Peter Hirsch; Peter Hirsch; Claire Managan; Claire Managan; Ramya Kandarpa; Josef Lorenz Rumberger; Josef Lorenz Rumberger; Annika Reinke; Annika Reinke; Lena Maier-Hein; Lena Maier-Hein; Gudrun Ihrke; Gudrun Ihrke; Dagmar Kainmueller; Dagmar Kainmueller; Ramya Kandarpa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 26, 2024
Description
General

For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

Summary

A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains

30 completely labeled (segmented) images

71 partly labeled images

altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)

To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects

A set of metrics and a novel ranking score for respective meaningful method benchmarking

An evaluation of three baseline methods in terms of the above metrics and score

Abstract

Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

Dataset documentation:

We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

>> FISBe Datasheet

Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

Files

fisbe_v1.0_{completely,partly}.zip

contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.

fisbe_v1.0_mips.zip

maximum intensity projections of all samples, for convenience.

sample_list_per_split.txt

a simple list of all samples and the subset they are in, for convenience.

view_data.py

a simple python script to visualize samples, see below for more information on how to use it.

dim_neurons_val_and_test_sets.json

a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.

Readme.md

general information

How to work with the image files

Each sample consists of a single 3d MCFO image of neurons of the fruit fly.
For each image, we provide a pixel-wise instance segmentation for all separable neurons.
Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").
The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.
The segmentation mask for each neuron is stored in a separate channel.
The order of dimensions is CZYX.

We recommend to work in a virtual environment, e.g., by using conda:

conda create -y -n flylight-env -c conda-forge python=3.9
conda activate flylight-env

How to open zarr files

Install the python zarr package:
pip install zarr

Opened a zarr file with:

import zarr
raw = zarr.open(
seg = zarr.open(

# optional:
import numpy as np
raw_np = np.array(raw)

Zarr arrays are read lazily on-demand.
Many functions that expect numpy arrays also work with zarr arrays.
Optionally, the arrays can also explicitly be converted to numpy arrays.

How to view zarr image files

We recommend to use napari to view the image data.

Install napari:
pip install "napari[all]"

Save the following Python script:

import zarr, sys, napari

raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")
gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

viewer = napari.Viewer(ndisplay=3)
for idx, gt in enumerate(gts):
viewer.add_labels(
gt, rendering='translucent', blending='additive', name=f'gt_{idx}')
viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')
viewer.add_image(raw[1], colormap="green", name='raw_g', blending='additive')
viewer.add_image(raw[2], colormap="blue", name='raw_b', blending='additive')
napari.run()

Execute:
python view_data.py

Metrics

S: Average of avF1 and C

avF1: Average F1 Score

C: Average ground truth coverage

clDice_TP: Average true positives clDice

FS: Number of false splits

FM: Number of false merges

tp: Relative number of true positives

For more information on our selected metrics and formal definitions please see our paper.

Baseline

To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..
For detailed information on the methods and the quantitative results please see our paper.

License

The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Citation

If you use FISBe in your research, please use the following BibTeX entry:

@misc{mais2024fisbe, title = {FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures}, author = {Lisa Mais and Peter Hirsch and Claire Managan and Ramya Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller}, year = 2024, eprint = {2404.00130}, archivePrefix ={arXiv}, primaryClass = {cs.CV} }

Acknowledgments

We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuable
discussions.
P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.
This work was co-funded by Helmholtz Imaging.

Changelog

There have been no changes to the dataset so far.
All future change will be listed on the changelog page.

Contributing

If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

All contributions are welcome!
On Premises Real Time Database Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). On Premises Real Time Database Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/on-premises-real-time-database-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
On Premises Real Time Database Market Outlook

The global market size for on-premises real-time database solutions was estimated at USD 12.3 billion in 2023, and it is projected to reach USD 25.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 8.6% during the forecast period. This growth is driven by several factors, including the increasing need for efficient data management and real-time data analytics capabilities across various industry verticals such as BFSI, healthcare, retail, and manufacturing.

One of the primary growth factors for the on-premises real-time database market is the increasing volume of data generated by organizations. With the proliferation of IoT devices, social media platforms, and e-commerce activities, the amount of data generated is growing exponentially. Organizations are increasingly looking for robust database solutions that can handle real-time data processing and analytics to gain actionable insights and maintain a competitive edge. This trend is particularly evident in sectors like retail and manufacturing, where real-time data can significantly enhance operational efficiency and customer experience.

Another significant growth driver is the need for enhanced data security and compliance. While cloud-based solutions offer scalability and flexibility, many organizations, particularly in the BFSI and healthcare sectors, prefer on-premises databases due to stringent data security and compliance requirements. On-premises solutions provide organizations with greater control over their data, allowing them to implement tailored security measures and ensure compliance with industry-specific regulations such as GDPR, HIPAA, and others. This increased focus on data security is likely to continue driving the demand for on-premises real-time database solutions.

The technological advancements in database management systems are also propelling market growth. Innovations such as in-memory databases, multi-model databases, and enhanced query processing capabilities are enabling organizations to achieve faster data retrieval and improved performance. Additionally, the integration of artificial intelligence and machine learning algorithms in database systems is providing advanced analytics capabilities, further enhancing the value proposition of on-premises real-time databases. These technological advancements are expected to attract more organizations to invest in on-premises solutions.

Operational Database Management System (ODBMS) plays a pivotal role in the landscape of on-premises real-time databases. These systems are designed to handle a wide array of data management tasks, including transaction processing, data retrieval, and storage management, all in real-time. The efficiency of an ODBMS is crucial for businesses that require immediate access to data to make timely decisions. In sectors like finance and healthcare, where data accuracy and speed are paramount, the implementation of a robust ODBMS ensures that organizations can maintain high performance and reliability. Furthermore, with the integration of advanced features such as in-memory processing and multi-model support, ODBMS solutions are becoming increasingly sophisticated, offering enhanced capabilities to meet the growing demands of modern enterprises.

Regionally, North America holds the largest market share due to the early adoption of advanced technologies and the presence of major industry players. The region's strong emphasis on data security and regulatory compliance also supports the adoption of on-premises solutions. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period, driven by the rapid digital transformation initiatives, increasing IT investments, and the growing importance of real-time data analytics in emerging economies such as China and India.

Component Analysis

When analyzing the on-premises real-time database market by component, it is essential to consider the three main segments: software, hardware, and services. The software component, which includes database management systems and related applications, is the largest segment. This dominance is due to the critical role that software plays in managing, storing, and analyzing real-time data. Organizations are continually seeking advanced software solutions that offer enhanced performance, reliability, and scalability. Innovations in database software, such as in-memory processing and multi-model datab
A
‘Climate Change Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Dec 13, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘Climate Change Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-climate-change-dataset-7e65/4a67af59/?iid=002-150&v=presentation
Explore at:
Dataset updated
Dec 13, 2018
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Climate Change Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/climate-change-datae on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Data from World Development Indicators and Climate Change Knowledge Portal on climate systems, exposure to climate impacts, resilience, greenhouse gas emissions, and energy use.

In addition to the data available here and through the Climate Data API, the Climate Change Knowledge Portal has a web interface to a collection of water indicators that may be used to assess the impact of climate change across over 8,000 water basins worldwide. You may use the web interface to download the data for any of these basins.

Here is how to navigate to the water data:

Go to the Climate Change Knowledge Portal home page (http://climateknowledgeportal.worldbank.org/)

Click any region on the map Click a country In the navigation menu

Click "Impacts" and then "Water" Click the map to select a specific water basin

Click "Click here to get access to data and indicators" Please be sure to observe the disclaimers on the website regarding uncertainties and use of the water data.

Attribution: Climate Change Data, World Bank Group.

World Bank Data Catalog Terms of Use

Source: http://data.worldbank.org/data-catalog/climate-change

This dataset was created by World Bank and contains around 10000 samples along with 2009, 1993, technical information and other features such as: - 1994 - Series Code - and more.

How to use this dataset

Analyze 1995 in relation to Scale

Study the influence of 1998 on Country Code

More datasets

Acknowledgements

If you use this dataset in your research, please credit World Bank

--- Original source retains full ownership of the source dataset ---
A
‘What The World Thinks Of Trump?’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘What The World Thinks Of Trump?’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-what-the-world-thinks-of-trump-9a66/ef5d3ea5/?iid=007-925&v=presentation
Explore at:
Dataset updated
Aug 4, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
Analysis of ‘What The World Thinks Of Trump?’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/trump-world-truste on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

See Readme for more details.
This repository contains a selection of the data -- and the data-processing scripts -- behind the articles, graphics and interactives at FiveThirtyEight.

2017-09-18 What The World Thinks Of Trump

We hope you'll use it to check our work and to create stories and visualizations of your own. The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know.

Source: https://github.com/fivethirtyeight/data

This dataset was created by FiveThirtyEight and contains around 0 samples along with Hungary, South Africa, technical information and other features such as: - Brazil - Kenya - and more.

How to use this dataset

Analyze Russia in relation to Japan

Study the influence of Uk on Spain

More datasets

Acknowledgements

If you use this dataset in your research, please credit FiveThirtyEight

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
A
‘50 Years Of World Cup Doppelgangers’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘50 Years Of World Cup Doppelgangers’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-50-years-of-world-cup-doppelgangers-c448/d5846ac8/?iid=003-442&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
Analysis of ‘50 Years Of World Cup Doppelgangers’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/world-cup-comparisonse on 28 January 2022.

--- Dataset description provided by original source is as follows ---

https://i.ibb.co/k5sWjcT/Selection-700.png" alt="">

About this dataset

This file contains links to the data behind 50 Years Of World Cup Doppelgangers.

world_cup_comparisons.csv contains all historical players and their associated z-score for each of the 16 metrics.

The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know.

Source: https://github.com/fivethirtyeight/data

This dataset was created by FiveThirtyEight and contains around 6000 samples along with Fouls Z, Crosses Z, technical information and other features such as: - Clearances Z - Blocks Z - and more.

How to use this dataset

Analyze Boxtouches Z in relation to Fouled Z

Study the influence of Nsxg Z on Team

More datasets

Acknowledgements

If you use this dataset in your research, please credit FiveThirtyEight

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
Statistically downscaled climate indices from CMIP6 global climate models...
ouvert.canada.ca
data.urbandatacentre.ca
+3more
html, netcdf
Updated Jan 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environment and Climate Change Canada (2025). Statistically downscaled climate indices from CMIP6 global climate models (CanDCS-U6 & CanDCS-M6) [Dataset]. https://ouvert.canada.ca/data/dataset/764720d5-8c0a-4e1e-93fc-d9e3eb0ab6b3
Explore at:
netcdf, htmlAvailable download formats
Dataset updated
Jan 28, 2025
Dataset provided by
Environment And Climate Change Canadahttps://www.canada.ca/en/environment-climate-change.html
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Jan 1, 1951 - Dec 31, 2100
Description
Environment and Climate Change Canada’s (ECCC) Climate Research Division (CRD) and the Pacific Climate Impacts Consortium (PCIC) previously produced statistically downscaled climate scenarios based on simulations from climate models that participated in the Coupled Model Intercomparison Project phase 5 (CMIP5) in 2015. ECCC and PCIC have now updated the CMIP5-based downscaled scenarios with two new sets of downscaled scenarios based on the next generation of climate projections from the Coupled Model Intercomparison Project phase 6 (CMIP6). The scenarios are named Canadian Downscaled Climate Scenarios–Univariate method from CMIP6 (CanDCS-U6) and Canadian Downscaled Climate Scenarios–Multivariate method from CMIP6 (CanDCS-M6). CMIP6 climate projections are based on both updated global climate models and new emissions scenarios called “Shared Socioeconomic Pathways” (SSPs). Statistically downscaled datasets have been produced from 26 CMIP6 global climate models (GCMs) under three different emission scenarios (i.e., SSP1-2.6, SSP2-4.5, and SSP5-8.5), with PCIC later adding SSP3-7.0 to the CanDCS-M6 dataset. The CanDCS-U6 was downscaled using the Bias Correction/Constructed Analogues with Quantile mapping version 2 (BCCAQv2) procedure, and the CanDCS-M6 was downscaled using the N-dimensional Multivariate Bias Correction (MBCn) method. The CanDCS-U6 dataset was produced using the same downscaling target data (NRCANmet) as the CMIP5-based downscaled scenarios, while the CanDCS-M6 dataset implements a new target dataset (ANUSPLIN and PNWNAmet blended dataset). Statistically downscaled individual model output and ensembles are available for download. Downscaled climate indices are available across Canada at 10km grid spatial resolution for the 1950-2014 historical period and for the 2015-2100 period following each of the three emission scenarios. A total of 31 climate indices have been calculated using the CanDCS-U6 and CanDCS-M6 datasets. The climate indices include 27 Climdex indices established by the Expert Team on Climate Change Detection and Indices (ETCCDI) and 4 additional indices that are slightly modified from the Climdex indices. These indices are calculated from daily precipitation and temperature values from the downscaled simulations and are available at annual or monthly temporal resolution, depending on the index. Monthly indices are also available in seasonal and annual versions. Note: projected future changes by statistically downscaled products are not necessarily more credible than those by the underlying climate model outputs. In many cases, especially for absolute threshold-based indices, projections based on downscaled data have a smaller spread because of the removal of model biases. However, this is not the case for all indices. Downscaling from GCM resolution to the fine resolution needed for impacts assessment increases the level of spatial detail and temporal variability to better match observations. Since these adjustments are GCM dependent, the resulting indices could have a wider spread when computed from downscaled data as compared to those directly computed from GCM output. In the latter case, it is not the downscaling procedure that makes future projection more uncertain; rather, it is indicative of higher variability associated with finer spatial scale. Individual model datasets and all related derived products are subject to the terms of use (https://pcmdi.llnl.gov/CMIP6/TermsOfUse/TermsOfUse6-1.html) of the source organization.
A
‘How Every NFL Team’s Fans Lean Politically?’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘How Every NFL Team’s Fans Lean Politically?’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-how-every-nfl-teams-fans-lean-politically-550a/f911ccf2/?iid=003-014&v=presentation
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘How Every NFL Team’s Fans Lean Politically?’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/nfl-fandome on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Data behind the story How Every NFL Team’s Fans Lean Politically.

Google Trends Data

Google Trends data was derived from comparing 5-year search traffic for the 7 sports leagues we analyzed:

https://g.co/trends/5P8aa

Results are listed by designated market area (DMA).

The percentages are the approximate percentage of major-sports searches that were conducted for each league.

Trump's percentage is his share of the vote within the DMA in the 2016 presidential election.

SurveyMonkey Data

SurveyMonkey data was derived from a poll of American adults ages 18 and older, conducted between Sept. 1-7, 2017.

Listed numbers are the raw totals for respondents who ranked a given NFL team among their three favorites, and how many identified with a given party (further broken down by race). We also list the percentages of the entire sample that identified with each party, and were of each race.

The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know.

Source: https://github.com/fivethirtyeight/data

This dataset was created by FiveThirtyEight and contains around 0 samples along with Unnamed: 10, Unnamed: 4, technical information and other features such as: - Unnamed: 3 - Unnamed: 1 - and more.

How to use this dataset

Analyze Unnamed: 13 in relation to Unnamed: 21

Study the influence of Unnamed: 7 on Unnamed: 12

More datasets

Acknowledgements

If you use this dataset in your research, please credit FiveThirtyEight

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
A
‘💣 Global Terrorism Database (GTD)’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘💣 Global Terrorism Database (GTD)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-global-terrorism-database-gtd-b0b8/e6145d55/?iid=005-149&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘💣 Global Terrorism Database (GTD)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/terrorisme on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

See Readme for more details.
This repository contains a selection of the data -- and the data-processing scripts -- behind the articles, graphics and interactives at FiveThirtyEight.

We hope you'll use it to check our work and to create stories and visualizations of your own. The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know](andrei.scheinkman@fivethirtyeight.com).

Source: https://github.com/fivethirtyeight/data

This dataset was created by FiveThirtyEight and contains around 0 samples along with Ireland, Denmark, technical information and other features such as: - Greece - Luxembourg - and more.

How to use this dataset

Analyze Germany in relation to Italy

Study the influence of France on United Kingdom

More datasets

Acknowledgements

If you use this dataset in your research, please credit FiveThirtyEight

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
Sustainable Development Goals
kaggle.com
zip
Updated Jan 12, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2019). Sustainable Development Goals [Dataset]. https://www.kaggle.com/theworldbank/sustainable-development-goals
Explore at:
zip(20674194 bytes)Available download formats
Dataset updated
Jan 12, 2019
Dataset authored and provided by
World Bankhttp://worldbank.org/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Content

Relevant indicators drawn from the World Development Indicators, reorganized according to the goals and targets of the Sustainable Development Goals (SDGs). These indicators may help to monitor SDGs, but they are not always the official indicators for SDG monitoring.

Context

This is a dataset hosted by the World Bank. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore the World Bank using Kaggle and all of the data sources available through the World Bank organization page!

Update Frequency: This dataset is updated daily.

Acknowledgements

This dataset is maintained using the World Bank's APIs and Kaggle's API.

Cover photo by NA on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
t
Privacy-Sensitive Conversations between Care Workers and Care Home Residents...
researchdata.tuwien.ac.at
test.researchdata.tuwien.ac.at
bin, text/markdown
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reinhard Grabler; Reinhard Grabler; Michael Starzinger; Michael Starzinger; Matthias Hirschmanner; Matthias Hirschmanner; Helena Anna Frijns; Helena Anna Frijns (2025). Privacy-Sensitive Conversations between Care Workers and Care Home Residents in a Residential Care Home [Dataset]. http://doi.org/10.48436/q1kt0-edc53
Explore at:
bin, text/markdownAvailable download formats
Unique identifier
https://doi.org/10.48436/q1kt0-edc53
Dataset updated
Feb 25, 2025
Dataset provided by
TU Wien
Authors
Reinhard Grabler; Reinhard Grabler; Michael Starzinger; Michael Starzinger; Matthias Hirschmanner; Matthias Hirschmanner; Helena Anna Frijns; Helena Anna Frijns
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 2024 - Aug 2024
Description
Dataset Card for "privacy-care-interactions"

Table of Contents

Dataset Description

Purpose and Features

Dataset Overview

Language Distribution

Locale Distribution

Key Facts

Dataset Structure

Data Instances

Data Fields

Data Splits

Dataset Creation

Curation Rationale

Source Data

Annotations

Personal and Sensitive Information

Considerations for Using the Data

Social Impact of Dataset

Discussion of Biases

Other Known Limitations

Additional Information

Dataset Curators

Licensing Information

Citation Information

Contributions

Dataset Description

Purpose and Features

🔒 Collection of Privacy-Sensitive Conversations between Care Workers and Care Home Residents in an Residential Care Home 🔒

The dataset is useful to train and evaluate models to identify and classify privacy-sensitive parts of conversations from text, especially in the context of AI assistants and LLMs.

Dataset Overview

Total entries: 95

Number of distinct taxonomy categories in the public dataset: 4

Number of distinct conversational categories in public dataset: 7

Papers:

Continues the work of: Privacy Agents: Utilizing Large Language Models to Safeguard Contextual Integrity in Elderly Care

Continues the work of: Prototype of a care documentation support system using audio recordings of care actions and large language models

Language Distribution 🌍

English (en): 95

Locale Distribution 🌎

United States (US) 🇺🇸: 95

Key Facts 🔑

This is synthetic data! Generated using proprietary algorithms - no privacy violations!

Conversations are classified following the taxonomy for privacy-sensitive robotics by Rueben et al. (2017).

The data was manually labeled by an expert.

Dataset Structure

Data Instances

The provided data format is .jsonl, the JSON Lines text format, also called newline-delimited JSON. An example entry looks as follows.

{ "text": "CW: Have you ever been to Italy? CR: Oh, yes... many years ago.", "taxonomy": 0, "category": 0, "affected_speaker": 1, "language": "en", "locale": "US", "data_type": 1, "uid": 16, "split": "train" }

Data Fields

The data fields are:

text: a string feature. The abbreviaton of the speakers refer to the care worker (CW) and the care recipient (CR).

taxonomy: a classification label, with possible values including informational (0), invasion (1), collection (2), processing (3), dissemination (4), physical (5), personal-space (6), territoriality (7), intrusion (8), obtrusion (9), contamination (10), modesty (11), psychological (12), interrogation (13), psychological-distance (14), social (15), association (16), crowding-isolation (17), public-gaze (18), solitude (19), intimacy (20), anonymity (21), reserve (22). The taxonomy is derived from Rueben et al. (2017). The classifications were manually labeled by an expert.

category: a classification label, with possible values including personal-information (0), family (1), health (2), thoughts (3), values (4), acquaintance (5), appointment (6). The privacy category affected in the conversation. The classifications were manually labeled by an expert.

affected_speaker: a classification label, with possible values including care-worker (0), care-recipient (1), other (2), both (3). The speaker whose privacy is impacted during the conversation. The classifications were manually labeled by an expert.

language: a string feature. Language code as defined by ISO 639.

locale: a string feature. Regional code as defined by ISO 3166-1 alpha-2.

data_type: a string a classification label, with possible values including real (0), synthetic (1).

uid: a int64 feature. A unique identifier within the dataset.

split: a string feature. Either train, validation or test.

Dataset Splits

The dataset has 2 subsets:

split: with a total of 95 examples split into train, validation and test (70%-15%-15%)

unsplit: with a total of 95 examples in a single train split

name train validation test
split 66 14 15
unsplit 95 n/a n/a

The files follow the naming convention subset-split-language.jsonl. The following files are contained in the dataset:

split-train-en.jsonl

split-validation-en.jsonl

split-test-en.jsonl

unsplit-train-en.jsonl

Dataset Creation

Curation Rationale

Recording audio of care workers and residents during care interactions, which includes partial and full body washing, giving of medication, as well as wound care, is a highly privacy-sensitive use case. Therefore, a dataset is created, which includes privacy-sensitive parts of conversations, synthesized from real-world data. This dataset serves as a basis for fine-tuning a local LLM to highlight and classify privacy-sensitive sections of transcripts created in care interactions, to further mask them to protect privacy.

Source Data

Initial Data Collection

The intial data was collected in the project Caring Robots of TU Wien in cooperation with Caritas Wien. One project track aims to facilitate Large Languge Models (LLM) to support documentation of care workers, with LLM-generated summaries of audio recordings of interactions between care workers and care home residents. The initial data are the transcriptions of those care interactions.

Data Processing

The transcriptions were thoroughly reviewed, and sections containing privacy-sensitive information were identified and marked using qualitative data analysis software by two experts. Subsequently, the sections were translated from German to U.S. English using the locally executed LLM icky/translate. In the next step, another llama3.1:70b was used locally to synthesize the conversation segments. This process involved generating similar, yet distinct and new, conversations that are not linked to the original data. The dataset was split using the train_test_split function from the <a href="https://scikit-learn.org/1.5/modules/generated/sklearn.model_selection.train_test_split.html" target="_blank" rel="noopener
MODIS Thermal (Last 7 days)
wifire-data.sdsc.edu
Updated Mar 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2023). MODIS Thermal (Last 7 days) [Dataset]. https://wifire-data.sdsc.edu/dataset/modis-thermal-last-7-days
Explore at:
html, zip, csv, kml, geojson, arcgis geoservices rest apiAvailable download formats
Dataset updated
Mar 3, 2023
Dataset provided by
Esrihttp://esri.com/
Description
This layer presents detectable thermal activity from MODIS satellites for the last 7 days. MODIS Global Fires is a product of NASA’s Earth Observing System Data and Information System (EOSDIS), part of NASA's Earth Science Data. EOSDIS integrates remote sensing and GIS technologies to deliver global MODIS hotspot/fire locations to natural resource managers and other stakeholders around the World.

Consumption Best Practices:

As a service that is subject to Viral loads (very high usage), avoid adding Filters that use a Date/Time type field. These queries are not cacheable and WILL be subject to 'https://en.wikipedia.org/wiki/Rate_limiting' rel='nofollow ugc'>Rate Limiting by ArcGIS Online. To accommodate filtering events by Date/Time, we encourage using the included "Age" fields that maintain the number of Days or Hours since a record was created or last modified compared to the last service update. These queries fully support the ability to cache a response, allowing common query results to be supplied to many users without adding load on the service.
When ingesting this service in your applications, avoid using POST requests, these requests are not cacheable and will also be subject to Rate Limiting measures.
Source: NASA FIRMS - Active Fire Data - for World

Scale/Resolution: 1km

Update Frequency: 1/2 Hour (every 30 minutes) using the Aggregated Live Feed Methodology

Area Covered: World

What can I do with this layer?
The MODIS thermal activity layer can be used to visualize and assess wildfires worldwide. However, it should be noted that this dataset contains many “false positives” (e.g., oil/natural gas wells or volcanoes) since the satellite will detect any large thermal signal.

Additional Information
MODIS stands for MODerate resolution Imaging Spectroradiometer. The MODIS instrument is on board NASA’s Earth Observing System (EOS) Terra (EOS AM) and Aqua (EOS PM) satellites. The orbit of the Terra satellite goes from north to south across the equator in the morning and Aqua passes south to north over the equator in the afternoon resulting in global coverage every 1 to 2 days. The EOS satellites have a ±55 degree scanning pattern and orbit at 705 km with a 2,330 km swath width.

It takes approximately 2 – 4 hours after satellite overpass for MODIS Rapid Response to process the data, and for the Fire Information for Resource Management System (FIRMS) to update the website. Occasionally, hardware errors can result in processing delays beyond the 2-4 hour range. Additional information on the MODIS system status can be found at MODIS Rapid Response.

Attribute Information
Latitude and Longitude: The center point location of the 1km (approx.) pixel flagged as containing one or more fires/hotspots (fire size is not 1km, but variable). Stored by Point Geometry. See What does a hotspot/fire detection mean on the ground?
Brightness: The brightness temperature measured (in Kelvin) using the MODIS channels 21/22 and channel 31.
Scan and Track: The actual spatial resolution of the scanned pixel. Although the algorithm works at 1km resolution, the MODIS pixels get bigger toward the edge of the scan. See What does scan and track mean?
Date and Time: Acquisition date of the hotspot/active fire pixel and time of satellite overpass in UTC (client presentation in local time). Stored by Acquisition Date.
Acquisition Date: Derived Date/Time field combining Date and Time attributes.
Satellite: Whether the detection was picked up by the Terra or Aqua satellite.
Confidence: The detection confidence is a quality flag of the individual hotspot/active fire pixel.
Version: Version refers to the processing collection and source of data. The number before the decimal refers to the collection (e.g. MODIS Collection 6). The number after the decimal indicates the source of Level 1B data; data processed in near-real time by MODIS Rapid Response will have the source code “CollectionNumber.0”. Data sourced from MODAPS (with a 2-month lag) and processed by FIRMS using the standard MOD14/MYD14 Thermal Anomalies algorithm will have a source code “CollectionNumber.x”. For example, data with the version listed as 5.0 is collection 5, processed by MRR, data with the version listed as 5.1 is collection 5 data processed by FIRMS using Level 1B data from MODAPS.
Bright.T31: Channel 31 brightness temperature (in Kelvins) of the hotspot/active fire pixel.
FRP: Fire Radiative Power. Depicts the pixel-integrated fire radiative power in MW (MegaWatts). FRP provides information on the measured radiant heat output of detected fires. The amount of radiant heat energy liberated per unit time (the Fire Radiative Power) is thought to be related to the rate at which fuel is being consumed (Wooster et. al. (2005)).
DayNight: The standard processing algorithm uses the solar zenith angle (SZA) to threshold the day/night value; if the SZA exceeds 85 degrees it is assigned a night value. SZA values less than 85 degrees are assigned a day time value. For the NRT algorithm the day/night flag is assigned by ascending (day) vs descending (night) observation. It is expected that the NRT assignment of the day/night flag will be amended to be consistent with the standard processing.
Hours Old: Derived field that provides age of record in hours between Acquisition date/time and latest update date/time. 0 = less than 1 hour ago, 1 = less than 2 hours ago, 2 = less than 3 hours ago, and so on.
Revisions
June 22, 2022: Added 'HOURS_OLD' field to enhance Filtering data. Added 'Last 7 days' Layer to extend data to match time range of VIIRS offering. Added Field level descriptions.
This map is provided for informational purposes and is not monitored 24/7 for accuracy and
A
‘Predicting Women's NBA (WNBA)’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘Predicting Women's NBA (WNBA)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-predicting-women-s-nba-wnba-dbae/latest
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Predicting Women's NBA (WNBA)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/wnba-forecastse on 28 January 2022.

--- Dataset description provided by original source is as follows ---

https://i.ibb.co/4dcHDh5/WNBA.png" alt="">

About this dataset

About

This file contains links to the data behind our WNBA Predictions. More information on how our WNBA Elo model works can be found in this article.

wnba_elo.csv contains game-by-game Elo ratings and forecasts since 1997.

wnba_elo_latest.csv contains game-by-game Elo ratings and forecasts for only the latest season.

License

Data released under the Creative Commons Attribution 4.0 License

Source

GitHub

This dataset was created by data.world's Admin and contains around 6000 samples along with Home Team Postgame Rating, Home Team, technical information and other features such as: - Date - Away Team - and more.

How to use this dataset

Analyze Neutral in relation to Home Team Pregame Rating

Study the influence of Away Team Postgame Rating on Season

More datasets

Acknowledgements

If you use this dataset in your research, please credit data.world's Admin

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
Z
CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...
data.niaid.nih.gov
zenodo.org
Updated Jun 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Anselmo Stahl (2024). CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company Similarity Quantification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7957401
Explore at:
Dataset updated
Jun 4, 2024
Dataset provided by
Mark Granroth-Wilding
Vilhelm von Ehrenheim
Dhiana Deva Cavacanti Rocha
Richard Anselmo Stahl
Armin Catovic
Lele Cao
Drew McCornack
Description
CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.

Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.

Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.

Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:

Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.

Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.

Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.

Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).

Background and Motivation

In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.

While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.

In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.

However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.

Source Code and Tutorial:https://github.com/llcresearch/CompanyKG2

Paper: to be published
a
WDPA - World Database on Protected Areas polygons from WCMC
hub.arcgis.com
globil-panda.opendata.arcgis.com
Updated Dec 30, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Wide Fund for Nature (2016). WDPA - World Database on Protected Areas polygons from WCMC [Dataset]. https://hub.arcgis.com/maps/61cde74cf99645b7b2c30212514ddae5
Explore at:
Dataset updated
Dec 30, 2016
Dataset authored and provided by
World Wide Fund for Nature
Area covered
Description
The World Database on Protected Areas (WDPA) is the most comprehensive global database of marine and terrestrial protected areas and is one of the key global biodiversity datasets being widely used by scientists, businesses, governments, International secretariats and others to inform planning, policy decisions and management.The WDPA is a joint project between the United Nations Environment Programme (UNEP) and the International Union for Conservation of Nature (IUCN). The compilation and management of the WDPA is carried out by UNEP World Conservation Monitoring Centre (UNEP-WCMC), in collaboration with governments, non-governmental organisations, academia and industry. There are monthly updates of the data which are made available online through the Protected Planet website where the data is both viewable and downloadable.Data and information on the world's protected areas compiled in the WDPA are used for reporting to the Convention on Biological Diversity on progress towards reaching the Aichi Biodiversity Targets (particularly Target 11), to the UN to track progress towards the 2030 Sustainable Development Goals, to some of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) core indicators, and other international assessments and reports including the Global Biodiversity Outlook, as well as for the publication of the United Nations List of Protected Areas. Every two years, UNEP-WCMC releases the Protected Planet Report on the status of the world's protected areas and recommendations on how to meet international goals and targets.Many platforms are incorporating the WDPA to provide integrated information to diverse users, including businesses and governments, in a range of sectors including mining, oil and gas, and finance. For example, the WDPA is included in the Integrated Biodiversity Assessment Tool, an innovative decision support tool that gives users easy access to up-to-date information that allows them to identify biodiversity risks and opportunities within a project boundary.The reach of the WDPA is further enhanced in services developed by other parties, such as theGlobal Forest Watch and the Digital Observatory for Protected Areas, which provide decision makers with access to monitoring and alert systems that allow whole landscapes to be managed better. Together, these applications of the WDPA demonstrate the growing value and significance of the Protected Planet initiative.For more details on the WDPA please read through the WDPA User Manual.

name	train	validation	test
split	66	14	15
unsplit	95	n/a	n/a

Facebook

Twitter

Click to copy link

Link copied

Cite

Secretariat of the Pacific Regional Environment Programme (2025). World Database on Protected Areas [Dataset]. https://kiribati-data.sprep.org/dataset/world-database-protected-areas-0

Data from: World Database on Protected Areas

Explore at:

jpg(577876), pdf(2100272)Available download formats

Dataset updated

Feb 20, 2025

Dataset provided by

Pacific Regional Environment Programmehttps://www.sprep.org/

License

Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically

Area covered

POLYGON ((205.20703554153 -28.505385171432, 129.26953554153 1.4588219018416, 204.92578983307 6.2279312638895)), 167.23828554153 25.085596467854, 141.3632941246 -0.22851555560937, 129.26953554153 27.605668449605, 160.20703554153 -29.489341672009, 141.92578554153 -11.126668087769, Pacific Region

Description

The World Database on Protected Areas (WDPA) is the most comprehensive global database of marine and terrestrial protected areas, updated on a monthly basis, and is one of the key global biodiversity data sets being widely used by scientists, businesses, governments, International secretariats and others to inform planning, policy decisions and management. The WDPA is a joint project between UN Environment and the International Union for Conservation of Nature (IUCN). The compilation and management of the WDPA is carried out by UN Environment World Conservation Monitoring Centre (UNEP-WCMC), in collaboration with governments, non-governmental organisations, academia and industry. There are monthly updates of the data which are made available online through the Protected Planet website where the data is both viewable and downloadable. Data and information on the world's protected areas compiled in the WDPA are used for reporting to the Convention on Biological Diversity on progress towards reaching the Aichi Biodiversity Targets (particularly Target 11), to the UN to track progress towards the 2030 Sustainable Development Goals, to some of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES) core indicators, and other international assessments and reports including the Global Biodiversity Outlook, as well as for the publication of the United Nations List of Protected Areas. Every two years, UNEP-WCMC releases the Protected Planet Report on the status of the world's protected areas and recommendations on how to meet international goals and targets. Many platforms are incorporating the WDPA to provide integrated information to diverse users, including businesses and governments, in a range of sectors including mining, oil and gas, and finance. For example, the WDPA is included in the Integrated Biodiversity Assessment Tool, an innovative decision support tool that gives users easy access to up-to-date information that allows them to identify biodiversity risks and opportunities within a project boundary. The reach of the WDPA is further enhanced in services developed by other parties, such as the Global Forest Watch and the Digital Observatory for Protected Areas, which provide decision makers with access to monitoring and alert systems that allow whole landscapes to be managed better. Together, these applications of the WDPA demonstrate the growing value and significance of the Protected Planet initiative.

Clear search

Close search

Google apps

Main menu

Data from: World Database on Protected Areas

Global Green Economy Index (GGEI)

Protected Areas Database of the United States (PAD-US) 2.1 - World Database...

Data for "Prediction of Search Targets From Fixations in Open-World...

Learning Poverty Global Database

Datasets used for model learning and validation.

Data from: FISBe: A real-world benchmark dataset for instance segmentation...

General

Summary

Abstract

Dataset documentation:

Files

How to work with the image files

How to open zarr files

How to view zarr image files

Metrics

Baseline

License

Citation

Acknowledgments

Changelog

Contributing

On Premises Real Time Database Market Report | Global Forecast From 2025 To...

On Premises Real Time Database Market Outlook

Component Analysis

‘Climate Change Dataset’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

‘What The World Thinks Of Trump?’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Start A New Notebook!

‘50 Years Of World Cup Doppelgangers’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Start A New Notebook!

Statistically downscaled climate indices from CMIP6 global climate models...

‘How Every NFL Team’s Fans Lean Politically?’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Start A New Notebook!

‘💣 Global Terrorism Database (GTD)’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Start A New Notebook!

Sustainable Development Goals

Content

Context

Acknowledgements

Privacy-Sensitive Conversations between Care Workers and Care Home Residents...

Dataset Card for "privacy-care-interactions"

Table of Contents

Dataset Description

Purpose and Features

Dataset Overview

Language Distribution 🌍

Locale Distribution 🌎

Key Facts 🔑

Dataset Structure

Data Instances

Data Fields

Dataset Splits

Dataset Creation

Curation Rationale

Source Data

Initial Data Collection

Data Processing

MODIS Thermal (Last 7 days)

‘Predicting Women's NBA (WNBA)’ analyzed by Analyst-2

About this dataset

About

License

Source

How to use this dataset

Acknowledgements

Data from: World Database on Protected Areas