32 datasets found
  1. d

    Spatio-temporal linking of multiple SAR satellite data from medium and high...

    • b2find.dkrz.de
    Updated Sep 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Spatio-temporal linking of multiple SAR satellite data from medium and high resolution Radarsat-2 images - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/74b90d30-c365-5648-a3bd-1d3fed53db7a
    Explore at:
    Dataset updated
    Sep 21, 2021
    Description

    A recent development in Interferometric Synthetic Aperture Radar (InSAR) technology is integrating multiple SAR satellite data to dynamically extract ground features. This paper addresses two relevant challenges: identification of common ground targets from different SAR datasets in space, and concatenation of time series when dealing with temporal dynamics. To address the first challenge, we describe the geolocation uncertainty of InSAR measurements as a three-dimensional error ellipsoid. The points, among InSAR measurements, which have error ellipsoids with a positive cross volume are identified as tie-point pairs representing common ground objects from multiple SAR datasets. The cross volumes are calculated using Monte Carlo methods and serve as weights to achieve the equivalent deformation time series. To address the second challenge, the deformation time series model for each tie-point pair is estimated using probabilistic methods, where potential deformation models are efficiently tested and evaluated. As an application, we integrated two Radarsat-2 datasets in Standard and Extra-Fine modes to map the subsidence of the west of the Netherlands between 2010 and 2017. We identified 18128 tie-point pairs, 5 intersection types of error ellipsoids, 5 deformation models, and constructed their long-term deformation time series. The detected maximum mean subsidence velocity in Line-Of-Sight direction is up to 15mm/yr. We conclude that our method removes limitations that exist in single-viewing-geometry SAR when integrating multiple SAR data. In particular, the proposed time-series modeling method is useful to achieve a long-term deformation time series of multiple datasets.

  2. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(133186454988 bytes)Available download formats
    Dataset updated
    Mar 20, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  3. r

    On-street Parking Bays

    • researchdata.edu.au
    • data.melbourne.vic.gov.au
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.vic.gov.au (2023). On-street Parking Bays [Dataset]. https://researchdata.edu.au/on-street-parking-bays/2296305
    Explore at:
    Dataset updated
    Mar 7, 2023
    Dataset provided by
    data.vic.gov.au
    Description

    Upcoming Changes: Please note that our parking system is being improved and this dataset may be disrupted. See more information here.\r
    \r
    This dataset contains spatial polygons which represent parking bays across the city. Each bay can also link to it's parking meter, and parking sensor information.\r
    \r
    How the data joins:\r
    \r
    There are three datasets that make up the live parking sensor release. They are the on-street parking bay sensors, on-street parking bays and the on-street car park bay information. \r
    The way the datasets join is as follows. The on-street parking bay sensors join to the on-street parking bays by the marker_id attribute. The on-street parking bay sensors join to the on-street car park bay restrictions by the bay_id attribute. The on-street parking bays and the on-street car park bay information don’t currently join.\r
    \r
    \r
    \r
    Please see City of Melbourne's disclaimer regarding the use of this data. https://data.melbourne.vic.gov.au/stories/s/94s9-uahn

  4. f

    Data from: Automated Annotation of Untargeted All-Ion Fragmentation LC–MS...

    • acs.figshare.com
    • figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gonçalo Graça; Yuheng Cai; Chung-Ho E. Lau; Panagiotis A. Vorkas; Matthew R. Lewis; Elizabeth J. Want; David Herrington; Timothy M. D. Ebbels (2023). Automated Annotation of Untargeted All-Ion Fragmentation LC–MS Metabolomics Data with MetaboAnnotatoR [Dataset]. http://doi.org/10.1021/acs.analchem.1c03032.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    ACS Publications
    Authors
    Gonçalo Graça; Yuheng Cai; Chung-Ho E. Lau; Panagiotis A. Vorkas; Matthew R. Lewis; Elizabeth J. Want; David Herrington; Timothy M. D. Ebbels
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Untargeted metabolomics and lipidomics LC–MS experiments produce complex datasets, usually containing tens of thousands of features from thousands of metabolites whose annotation requires additional MS/MS experiments and expert knowledge. All-ion fragmentation (AIF) LC–MS/MS acquisition provides fragmentation data at no additional experimental time cost. However, analysis of such datasets requires reconstruction of parent–fragment relationships and annotation of the resulting pseudo-MS/MS spectra. Here, we propose a novel approach for automated annotation of isotopologues, adducts, and in-source fragments from AIF LC–MS datasets by combining correlation-based parent–fragment linking with molecular fragment matching. Our workflow focuses on a subset of features rather than trying to annotate the full dataset, saving time and simplifying the process. We demonstrate the workflow in three human serum datasets containing 599 features manually annotated by experts. Precision and recall values of 82–92% and 82–85%, respectively, were obtained for features found in the highest-rank scores (1–5). These results equal or outperform those obtained using MS-DIAL software, the current state of the art for AIF data annotation. Further validation for other biological matrices and different instrument types showed variable precision (60–89%) and recall (10–88%) particularly for datasets dominated by nonlipid metabolites. The workflow is freely available as an open-source R package, MetaboAnnotatoR, together with the fragment libraries from Github (https://github.com/gggraca/MetaboAnnotatoR).

  5. d

    Data and code to model the marine distribution of the European sturgeon by...

    • b2find.dkrz.de
    Updated Mar 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Data and code to model the marine distribution of the European sturgeon by joining global change and recovery scenarios - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/2f88e510-d421-5cb6-9942-2838fa75bd0c
    Explore at:
    Dataset updated
    Mar 20, 2024
    Area covered
    Europe
    Description

    Data and R Code for the manuscript entitled "Travelling away from home? Joining global change and recovery scenarios to anticipate the marine distribution of diadromous fish" published in Ecological Indicators. This deposit contains : 1/ The pdf file (Read_me.pdf) describing in more details Data and R Code as well as the study conducted; 2/ The CSV data file (DATASET_sturio.csv) including the data frame with presences (1990-2022) and pseudo-absences of the European sturgeon, and the five final variables selected in the final ensemble model. Future environmental variables used to make projections are also displayed for three temporal periods: FUTUR1 (2023-2052), FUTUR2 (2047-2076) and FUTUR3 (2070-2099). This file also contains the dispersal variables (i.e. distance to home) used to make projections in twelve river systems (10 currently unoccupied rivers with simulated stocking and 2 rivers with currently sustained population), to simulate recovery scenarios. The grid cell resolution is 10 x 10 km; 3/ The R Code (Script_Calibrate_Project_Ensemble_Models.R) including data input and formatting, the ensemble modelling approach as well as the projections under scenarios of global change and population recovery, using the R package 'biomod2'. The use of these data is authorized within the framework of the reproducibility of this present study. For any other use, a specific request must be made to the authors. Do not hesitate to contact authors for any information.

  6. r

    TM-Link

    • researchdata.edu.au
    • data.gov.au
    • +1more
    Updated Apr 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IP Australia (2019). TM-Link [Dataset]. https://researchdata.edu.au/tm-link/2980846
    Explore at:
    Dataset updated
    Apr 18, 2019
    Dataset provided by
    data.gov.au
    Authors
    IP Australia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Dictionary available here\r \r 2020 August UPDATE\r \r * Fixed some quality issues with dates in the application table.\r * Fixed some quality issues with names in the owner table.\r \r 2020 March UPDATE\r \r * The TM-Link dataset has been updated to include more recent data and also has expanded to include more information. As such, the structure of the dataset has also changed.\r \r ---\r TM-Link is an international dataset IP Australia and Swinburne University have developed in collaboration. The dataset provides information from various jurisdictions, modelled under a common schema for greater accessibility to researchers and analysts. TM-Link also links together similar trade marks from different countries based on common information, such as similar trade mark phrases and applicant names. These links identify families of international trade marks, which provide a new and unique insight into international branding trends and export behaviours.\r IP Australia and Swinburne University are looking to continually develop TM-Link to become a core part of the global IP data landscape. If you have any suggestions or requests to model any additional data points, or improve the current accuracy of the data please let us know via email to ipdataplatform@ipaustralia.gov.au.\r \r For more information on the linking algorithm, please see:\r \r Petrie S, Kollmann T, Codoreanu A, Thomson R & Webster E (2019); International Trademarking and Regional Export Performance. Available at SSRN: https://ssrn.com/abstract=3445244\r \r For more information on TM-Link data collection and descriptive analyses, please see:\r \r Petrie S, Adams M, Mitra‐Kahn B, Johnson M, Thomson R, Jensen PH, Palangkaraya A, & Webster EM (2019); TM-Link: An Internationally Linked Trade Mark Database. Australian Economic Review, Forthcoming. Available at SSRN: https://ssrn.com/abstract=3511526

  7. Alphabetic Decision Task (Arial Light Font)

    • openneuro.org
    Updated Nov 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack E. Taylor; Rasmus Sinn; Cosimo Iaia; Christian J. Fiebach (2024). Alphabetic Decision Task (Arial Light Font) [Dataset]. http://doi.org/10.18112/openneuro.ds005594.v1.0.2
    Explore at:
    Dataset updated
    Nov 9, 2024
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Jack E. Taylor; Rasmus Sinn; Cosimo Iaia; Christian J. Fiebach
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Generated from raw data by MNE-BIDS (Appelhoff et al., 2019) and custom code to join to behavioural data, stimulus information, and metadata.

    Notes on the Data

    For full details on this dataset, see our preprint: (url here once out)

    • An issue during recording meant that sub-05 completed the first block without data being saved. The experiment was restarted from the beginning for this participant. This participant was not included in our analyses, but the data are included in this dataset. They are also identified with the recording_restarted field in participants.tsv.

    • A separate issue during recording meant that EEG data for some trials were lost for sub-01, though enough trials were recorded in total to meet our criteria for inclusion in the analysis. The raw data comprised two separate recordings. In this dataset, the two recordings are concatenated end-to-end into one file. The point at which the files are joined is marked with a boundary event. This participant is identified with the recording_interrupted field in participants.tsv.

    • During the course of the experiment, we identified an issue with the wiring in one splitter box, which meant that voltages from channels FT7 and FC3 were swapped in the raw recorded data. We elected to keep the wiring as it was for the duration of the experiment, and then swapped the data from the two channels in the code that generated this BIDS dataset. This means that this issue has been corrected in this BIDS version of the data.

    • "BAD" periods (MNE term) for key presses and break periods are included in the events files.

    • Recording dates/times have been anonymised by shifting all recordings backwards in time by a constant number of days (same constant for all participants). This obscures information that may be used to identify participants, but preserves time-of-day information, and the relative times elapsed between different recordings.

    References

    Appelhoff, S., Sanderson, M., Brooks, T., Vliet, M., Quentin, R., Holdgraf, C., Chaumon, M., Mikulan, E., Tavabi, K., Höchenberger, R., Welke, D., Brunner, C., Rockhill, A., Larson, E., Gramfort, A. and Jas, M. (2019). MNE-BIDS: Organizing electrophysiological data into the BIDS format and facilitating their analysis. Journal of Open Source Software 4: (1896). https://doi.org/10.21105/joss.01896

    Pernet, C. R., Appelhoff, S., Gorgolewski, K. J., Flandin, G., Phillips, C., Delorme, A., Oostenveld, R. (2019). EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Scientific Data, 6, 103. https://doi.org/10.1038/s41597-019-0104-8

  8. Effect of data source on estimates of regional bird richness in northeastern...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated May 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roi Ankori-Karlinsky; Ronen Kadmon; Michael Kalyuzhny; Katherine F. Barnes; Andrew M. Wilson; Curtis Flather; Rosalind Renfrew; Joan Walsh; Edna Guk (2021). Effect of data source on estimates of regional bird richness in northeastern United States [Dataset]. http://doi.org/10.5061/dryad.m905qfv0h
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 4, 2021
    Dataset provided by
    Hebrew University of Jerusalem
    University of Michigan–Ann Arbor
    Columbia University
    Agricultural Research Service
    New York State Department of Environmental Conservation
    Massachusetts Audubon Society
    Gettysburg College
    University of Vermont
    Authors
    Roi Ankori-Karlinsky; Ronen Kadmon; Michael Kalyuzhny; Katherine F. Barnes; Andrew M. Wilson; Curtis Flather; Rosalind Renfrew; Joan Walsh; Edna Guk
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Northeastern United States, United States
    Description

    Standardized data on large-scale and long-term patterns of species richness are critical for understanding the consequences of natural and anthropogenic changes in the environment. The North American Breeding Bird Survey (BBS) is one of the largest and most widely used sources of such data, but so far, little is known about the degree to which BBS data provide accurate estimates of regional richness. Here we test this question by comparing estimates of regional richness based on BBS data with spatially and temporally matched estimates based on state Breeding Bird Atlases (BBA). We expected that estimates based on BBA data would provide a more complete (and therefore, more accurate) representation of regional richness due to their larger number of observation units and higher sampling effort within the observation units. Our results were only partially consistent with these predictions: while estimates of regional richness based on BBA data were higher than those based on BBS data, estimates of local richness (number of species per observation unit) were higher in BBS data. The latter result is attributed to higher land-cover heterogeneity in BBS units and higher effectiveness of bird detection (more species are detected per unit time). Interestingly, estimates of regional richness based on BBA blocks were higher than those based on BBS data even when differences in the number of observation units were controlled for. Our analysis indicates that this difference was due to higher compositional turnover between BBA units, probably due to larger differences in habitat conditions between BBA units and a larger number of geographically restricted species. Our overall results indicate that estimates of regional richness based on BBS data suffer from incomplete detection of a large number of rare species, and that corrections of these estimates based on standard extrapolation techniques are not sufficient to remove this bias. Future applications of BBS data in ecology and conservation, and in particular, applications in which the representation of rare species is important (e.g., those focusing on biodiversity conservation), should be aware of this bias, and should integrate BBA data whenever possible.

    Methods Overview

    This is a compilation of second-generation breeding bird atlas data and corresponding breeding bird survey data. This contains presence-absence breeding bird observations in 5 U.S. states: MA, MI, NY, PA, VT, sampling effort per sampling unit, geographic location of sampling units, and environmental variables per sampling unit: elevation and elevation range from (from SRTM), mean annual precipitation & mean summer temperature (from PRISM), and NLCD 2006 land-use data.

    Each row contains all observations per sampling unit, with additional tables containing information on sampling effort impact on richness, a rareness table of species per dataset, and two summary tables for both bird diversity and environmental variables.

    The methods for compilation are contained in the supplementary information of the manuscript but also here:

    Bird data

    For BBA data, shapefiles for blocks and the data on species presences and sampling effort in blocks were received from the atlas coordinators. For BBS data, shapefiles for routes and raw species data were obtained from the Patuxent Wildlife Research Center (https://databasin.org/datasets/02fe0ebbb1b04111b0ba1579b89b7420 and https://www.pwrc.usgs.gov/BBS/RawData).

    Using ArcGIS Pro© 10.0, species observations were joined to respective BBS and BBA observation units shapefiles using the Join Table tool. For both BBA and BBS, a species was coded as either present (1) or absent (0). Presence in a sampling unit was based on codes 2, 3, or 4 in the original volunteer birding checklist codes (possible breeder, probable breeder, and confirmed breeder, respectively), and absence was based on codes 0 or 1 (not observed and observed but not likely breeding). Spelling inconsistencies of species names between BBA and BBS datasets were fixed. Species that needed spelling fixes included Brewer’s Blackbird, Cooper’s Hawk, Henslow’s Sparrow, Kirtland’s Warbler, LeConte’s Sparrow, Lincoln’s Sparrow, Swainson’s Thrush, Wilson’s Snipe, and Wilson’s Warbler. In addition, naming conventions were matched between BBS and BBA data. The Alder and Willow Flycatchers were lumped into Traill’s Flycatcher and regional races were lumped into a single species column: Dark-eyed Junco regional types were lumped together into one Dark-eyed Junco, Yellow-shafted Flicker was lumped into Northern Flicker, Saltmarsh Sparrow and the Saltmarsh Sharp-tailed Sparrow were lumped into Saltmarsh Sparrow, and the Yellow-rumped Myrtle Warbler was lumped into Myrtle Warbler (currently named Yellow-rumped Warbler). Three hybrid species were removed: Brewster's and Lawrence's Warblers and the Mallard x Black Duck hybrid. Established “exotic” species were included in the analysis since we were concerned only with detection of richness and not of specific species.

    The resultant species tables with sampling effort were pivoted horizontally so that every row was a sampling unit and each species observation was a column. This was done for each state using R version 3.6.2 (R© 2019, The R Foundation for Statistical Computing Platform) and all state tables were merged to yield one BBA and one BBS dataset. Following the joining of environmental variables to these datasets (see below), BBS and BBA data were joined using rbind.data.frame in R© to yield a final dataset with all species observations and environmental variables for each observation unit.

    Environmental data

    Using ArcGIS Pro© 10.0, all environmental raster layers, BBA and BBS shapefiles, and the species observations were integrated in a common coordinate system (North_America Equidistant_Conic) using the Project tool. For BBS routes, 400m buffers were drawn around each route using the Buffer tool. The observation unit shapefiles for all states were merged (separately for BBA blocks and BBS routes and 400m buffers) using the Merge tool to create a study-wide shapefile for each data source. Whether or not a BBA block was adjacent to a BBS route was determined using the Intersect tool based on a radius of 30m around the route buffer (to fit the NLCD map resolution). Area and length of the BBS route inside the proximate BBA block were also calculated. Mean values for annual precipitation and summer temperature, and mean and range for elevation, were extracted for every BBA block and 400m buffer BBS route using Zonal Statistics as Table tool. The area of each land-cover type in each observation unit (BBA block and BBS buffer) was calculated from the NLCD layer using the Zonal Histogram tool.

  9. CDFLOW - A 30-year dataset on CO2 in flowing freshwaters in the United...

    • figshare.com
    txt
    Updated Oct 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timothy Toavs; Steve Midway; Caleb Hasler; Cory Suski (2022). CDFLOW - A 30-year dataset on CO2 in flowing freshwaters in the United States [Dataset]. http://doi.org/10.6084/m9.figshare.19787326.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 27, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Timothy Toavs; Steve Midway; Caleb Hasler; Cory Suski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    CDFLOW consists of CO2 estimates in flowing freshwaters across the lower 48 US states from 1990 through 2020. CO2 estimations were generated by downloading US water quality data, including pH, temperature, and total alkalinity measuments and using the USGS program PHREEQC to estimate CO2. Site data for CDFLOW was generated by spatially joining lat/long data from the dowloaded water quality data with the EPAs NHDPlus dataset to the nearest streamcatchment feature. NHDPlus streamcatchment features are geometric features of unique stream sections denoted by a unique COMID code. For a more detailed look at the creation of this dataset view (link in progrees).

    To download data generating code and supplemental R scripts use github link below:

    https://github.com/ttoavs/CDFLOW

  10. Link-prediction on Biomedical Knowledge Graphs

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Cattaneo; Daniel Justus; Stephen Bonner; Stephen Bonner; Thomas Martynec; Thomas Martynec; Alberto Cattaneo; Daniel Justus (2024). Link-prediction on Biomedical Knowledge Graphs [Dataset]. http://doi.org/10.5281/zenodo.12097377
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 25, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alberto Cattaneo; Daniel Justus; Stephen Bonner; Stephen Bonner; Thomas Martynec; Thomas Martynec; Alberto Cattaneo; Daniel Justus
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Time period covered
    Jun 25, 2021
    Description

    Release of the experimental data from the paper Towards Linking Graph Topology to Model Performance for Biomedical Knowledge Graph Completion (accepted at Machine Learning for Life and Material Sciences workshop @ ICML2024).

    Knowledge Graph Completion has been increasingly adopted as a useful method for several tasks in biomedical research, like drug repurposing or drug-target identification. To that end, a variety of datasets and Knowledge Graph Embedding models has been proposed over the years. However, little is known about the properties that render a dataset useful for a given task and, even though theoretical properties of Knowledge Graph Embedding models are well understood, their practical utility in this field remains controversial. We conduct a comprehensive investigation into the topological properties of publicly available biomedical Knowledge Graphs and establish links to the accuracy observed in real-world applications. By releasing all model predictions we invite the community to build upon our work and continue improving the understanding of these crucial applications.
    Experiments were conducted on six datasets: five from the biomedical domain (Hetionet, PrimeKG, PharmKG, OpenBioLink2020 HQ, PharMeBINet) and one trivia KG (FB15k-237). All datasets were randomly split into training, validation and test set (80% / 10% / 10%; in the case of PharMeBINet, 99.3% / 0.35% / 0.35% to mitigate the increased inference cost on the larger dataset).
    On each dataset, four different KGE models were compared: TransE, DistMult, RotatE, TripleRE. Hyperparameters were tuned on the validation split and we release results for tail predictions on the test split. In particular, each test query (h,r,?) is scored against all entities in the KG and we compute the rank of the score of the correct completion (h,r,t) , after masking out scores of other (h,r,t') triples contained in the graph.
    Note: the ranks provided are computed as the average between the optimistic and pessimistic ranks of triple scores.
    Inside experimental_data.zip, the following files are provided for each dataset:
    • {dataset}_preprocessing.ipynb: a Jupyter notebook for downloading and preprocessing the dataset. In particular, this generates the custom label->ID mapping for entities and relations, and the numerical tensor of (h_ID,r_ID,t_ID) triples for all edges in the graph, which can be used to compute graph topological metrics (e.g., using kg-topology-toolbox) and compare them with the edge prediction accuracy.
    • test_ranks.csv: csv table with columns ["h", "r", "t"] specifying the head, relation, tail IDs of the test triples, and columns ["DistMult", "TransE", "RotatE", "TripleRE"] with the rank of the ground-truth tail in the ordered list of predictions made by the four models;
    • entity_dict.csv: the list of entity labels, ordered by entity ID (as generated in the preprocessing notebook);
    • relation_dict.csv: the list of relation labels, ordered by relation ID (as generated in the preprocessing notebook).

    The separate top_100_tail_predictions.zip archive contains, for each of the test queries in the corresponding test_ranks.csv table, the IDs of the top-100 tail predictions made by each of the four KGE models, ordered by decreasing likelihood. The predictions are released in a .npz archive of numpy arrays (one array of shape (n_test_triples, 100) for each of the KGE models).

    All experiments (training and inference) have been run on Graphcore IPU hardware using the BESS-KGE distribution framework.

  11. g

    IP Australia - TM-Link | gimi9.com

    • gimi9.com
    Updated Feb 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). IP Australia - TM-Link | gimi9.com [Dataset]. https://gimi9.com/dataset/au_tm-link
    Explore at:
    Dataset updated
    Feb 13, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia
    Description

    Data Dictionary available here 2020 August UPDATE Fixed some quality issues with dates in the application table. Fixed some quality issues with names in the owner table. 2020 March UPDATE The TM-Link dataset has been updated to include more recent data and also has expanded to include more information. As such, the structure of the dataset has also changed. TM-Link is an international dataset IP Australia and Swinburne University have developed in collaboration. The dataset provides information from various jurisdictions, modelled under a common schema for greater accessibility to researchers and analysts. TM-Link also links together similar trade marks from different countries based on common information, such as similar trade mark phrases and applicant names. These links identify families of international trade marks, which provide a new and unique insight into international branding trends and export behaviours. IP Australia and Swinburne University are looking to continually develop TM-Link to become a core part of the global IP data landscape. If you have any suggestions or requests to model any additional data points, or improve the current accuracy of the data please let us know via email to ipdataplatform@ipaustralia.gov.au. For more information on the linking algorithm, please see: Petrie S, Kollmann T, Codoreanu A, Thomson R & Webster E (2019); International Trademarking and Regional Export Performance. Available at SSRN: https://ssrn.com/abstract=3445244 For more information on TM-Link data collection and descriptive analyses, please see: Petrie S, Adams M, Mitra‐Kahn B, Johnson M, Thomson R, Jensen PH, Palangkaraya A, & Webster EM (2019); TM-Link: An Internationally Linked Trade Mark Database. Australian Economic Review, Forthcoming. Available at SSRN: https://ssrn.com/abstract=3511526

  12. Global Biotic Interactions: Interpreted Data Products...

    • zenodo.org
    application/gzip, bin +1
    Updated Aug 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GloBI Community; GloBI Community (2023). Global Biotic Interactions: Interpreted Data Products hash://md5/89797a5a325ac5c50990581689718edf hash://sha256/946178b36c3ea2f2daa105ad244cf5d6cd236ec8c99956616557cf4e6666545b [Dataset]. http://doi.org/10.5281/zenodo.8284068
    Explore at:
    application/gzip, bin, zipAvailable download formats
    Dataset updated
    Aug 26, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    GloBI Community; GloBI Community
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Global Biotic Interactions: Interpreted Data Products

    Global Biotic Interactions (GloBI, https://globalbioticinteractions.org, [1]) aims to facilitate access to existing species interaction records (e.g., predator-prey, plant-pollinator, virus-host). This data publication provides interpreted species interaction data products. These products are the result of a process in which versioned, existing species interaction datasets ([2]) are linked to the so-called GloBI Taxon Graph ([3]) and transformed into various aggregate formats (e.g., tsv, csv, neo4j, rdf/nquad, darwin core-ish archives). In addition, the applied name maps are included to make the applied taxonomic linking explicit.

    Citation
    --------

    GloBI is made possible by researchers, collections, projects and institutions openly sharing their datasets. When using this data, please make sure to attribute these *original data contributors*, including citing the specific datasets in derivative work. Each species interaction record indexed by GloBI contains a reference and dataset citation. Also, a full lists of all references can be found in citations.csv/citations.tsv files in this publication. If you have ideas on how to make it easier to cite original datasets, please open/join a discussion via https://globalbioticinteractions.org or related projects.

    To credit GloBI for more easily finding interaction data, please use the following citation to reference GloBI:

    Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.

    Bias and Errors
    --------

    As with any analysis and processing workflow, care should be taken to understand the bias and error propagation of data sources and related data transformation processes. The datasets indexed by GloBI are biased geospatially, temporally and taxonomically ([5], [6]). Also, mapping of verbatim names from datasets to known name concept may contains errors due to synonym mismatches, outdated names lists, typos or conflicting name authorities. Finally, bugs may introduce bias and errors in the resulting integrated data product.

    To help better understand where bias and errors are introduced, only versioned data and code are used as an input: the datasets ([2]), name maps ([3]) and integration software ([6]) are versioned so that the integration processes can be reproduced if needed. This way, steps take to compile an integrated data record can be traced and the sources of bias and errors can be more easily found.

    Contents
    --------

    README:
    this file

    citations.csv.gz:
    contains data citations in a in a gzipped comma-separated values format.

    citations.tsv.gz:
    contains data citations in a gzipped tab-separated values format.

    datasets.csv.gz:
    contains list of indexed datasets in a gzipped comma-separated values format.

    datasets.tsv.gz:
    contains list of indexed datasets in a gzipped tab-separated values format.

    verbatim-interactions.csv.gz
    contains species interactions tabulated as pair-wise interaction in a gzipped comma-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.

    verbatim-interactions.tsv.gz
    contains species interactions tabulated as pair-wise interaction in a gzipped tab-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.

    interactions.csv.gz:
    contains species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.

    interactions.tsv.gz:
    contains species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.

    refuted-interactions.csv.gz:
    contains refuted species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.

    refuted-interactions.tsv.gz:
    contains refuted species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.

    refuted-verbatim-interactions.csv.gz:
    contains refuted species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.

    refuted-verbatim-interactions.tsv.gz:
    contains refuted species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.

    interactions.nq.gz:
    contains species interactions expressed in the resource description framework in a gzipped rdf/quads format.

    dwca-by-study.zip:
    contains species interactions data as a Darwin Core Archive aggregated by study using a custom, occurrence level, association extension.

    dwca.zip:
    contains species interactions data as a Darwin Core Archive using a custom, occurrence level, association extension.

    neo4j-graphdb.zip:
    contains a neo4j v3.5.x graph database snapshot containing a graph representation of the species interaction data.

    taxonCache.tsv.gz:
    contains hierarchies and identifiers associated with names from naming schemes in a gzipped tab-separated values format.

    taxonMap.tsv.gz:
    describes how names in existing datasets were mapped into existing naming schemes in a gzipped tab-separated values format.

    References
    -----

    [1] Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. doi: 10.1016/j.ecoinf.2014.08.005.

    [2] Poelen, J. H. (2020) Global Biotic Interactions: Elton Dataset Cache. Zenodo. doi: 10.5281/ZENODO.3950557.

    [3] Poelen, J. H. (2021). Global Biotic Interactions: Taxon Graph (Version 0.3.28) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4451472

    [4] Hortal, J. et al. (2015) Seven Shortfalls that Beset Large-Scale Knowledge of Biodiversity. Annual Review of Ecology, Evolution, and Systematics, 46(1), pp.523–549. doi: 10.1146/annurev-ecolsys-112414-054400.

    [5] Cains, M. et al. (2017) Ivmooc 2017 - Gap Analysis Of Globi: Identifying Research And Data Sharing Opportunities For Species Interactions. Zenodo. Zenodo. doi: 10.5281/ZENODO.814978.

    [6] Poelen, J. et al. (2022) globalbioticinteractions/globalbioticinteractions v0.24.6. Zenodo. doi: 10.5281/ZENODO.7327955.

    Content References
    -----

    hash://sha256/2ed02ef8ab52cb51aef6fb42badeb495ba6a87dd6cf11be5f480c7bc1c902054 citations.csv.gz
    hash://sha256/00195434368cec79f051ccb69238d2646b53530e4fd42936748428f055fdb0cc citations.tsv.gz
    hash://sha256/b8898e7aea05121e7d15948dcc76d4dde6ed330db98f76ebcc4c03ba52622dcc datasets.csv.gz
    hash://sha256/b8898e7aea05121e7d15948dcc76d4dde6ed330db98f76ebcc4c03ba52622dcc datasets.tsv.gz
    hash://sha256/aa13e6fb98fd3aa4aaeaa89d6dccfd983e542fe010f0ffbb31fa17243f5735e3 dwca-by-study.zip
    hash://sha256/4ae323bfc1255f3c6dd60b13a1be237cbfbd1c87aad595f7570e83fc9e84db08 dwca.zip
    hash://sha256/0f1328b00c1b44aa19cf677790a0e649ddceb2a4e0babbe251a4af9e032f3dde interactions.csv.gz
    hash://sha256/ad0297993328deee5178db4e5fe20135a21dde529f68adab63f8de9a02512514 interactions.nq.gz
    hash://sha256/1c8de35d42fb298f1a27f4eb286309e39e6ab768d24d3c3bec1490f23d3594b6 interactions.tsv.gz
    hash://sha256/f35ce82bf5c00882e4258edc883b41123f002c1fb9d64485abc101b00cb28e79 neo4j-graphdb.zip
    hash://sha256/b002bcb378482a33847725fc52c8e26a42af5c5da9755449d8f0d10c9aa9f7f0 refuted-interactions.csv.gz
    hash://sha256/7beb77546aad6e9de756d6161e35f55cfa725072ca77ba5c0b72a00e53146127 refuted-interactions.tsv.gz
    hash://sha256/89fa5fc3bdc76451dd5d2a79c1473b437615e5c7e551ec5e57ff8b71e9a280ea refuted-verbatim-interactions.csv.gz
    hash://sha256/ea83faba0aa0792cebe055832553197701025fdfe2f07ec34599075819916707 refuted-verbatim-interactions.tsv.gz
    hash://sha256/4cf48959ea839e371a0344aab4b31f36242c84ac24e44f4db948524523b3563f taxonCache.tsv.gz
    hash://sha256/bf38fe30df535f9e0b6b22fa726c10f35d391d616e6d107cc7582505141fd13d taxonMap.tsv.gz
    hash://sha256/ce0d4f35b0970df3fe4e1623e473a5390b39297efae7f9e1474bfe2e8bc15d48 verbatim-interactions.csv.gz
    hash://sha256/965718c7a9ec4ec1adc98413b52e31c090ad1ba5a04be088d579c5c9d59ffef0 verbatim-interactions.tsv.gz

    hash://md5/ad99f71b8d3e0b67b7d4578a0a123c40 citations.csv.gz
    hash://md5/2a27a963e745a12042c6c9886f87f842 citations.tsv.gz
    hash://md5/580a4e1cfed5a6235f6c35277d0c7b10 datasets.csv.gz
    hash://md5/580a4e1cfed5a6235f6c35277d0c7b10 datasets.tsv.gz
    hash://md5/6c7294aa2b507143e10c390ae6008ed1 dwca-by-study.zip
    hash://md5/a9694ecc6de81d9893998be05a8ef2de dwca.zip
    hash://md5/0415cd469b8892fb3f5435048b6e85bf interactions.csv.gz
    hash://md5/1b48bf7a344bdd3c706a94666607cd71 interactions.nq.gz
    hash://md5/445b2c97e2e44d2dbc4aa93084ecacfc interactions.tsv.gz
    hash://md5/ca3e4780032c8c58e90242bcdf1328d5 neo4j-graphdb.zip
    hash://md5/03600b16405fc2a4ea60925d69b6e16f

  13. Checklist and distribution of the species of Seychelles for conservationists...

    • gbif.org
    Updated Mar 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seychelles Key Biodiversity Areas National Coordination Group; Seychelles Key Biodiversity Areas National Coordination Group (2024). Checklist and distribution of the species of Seychelles for conservationists [Dataset]. http://doi.org/10.15468/rezu4h
    Explore at:
    Dataset updated
    Mar 19, 2024
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Seychelles National Herbarium
    Authors
    Seychelles Key Biodiversity Areas National Coordination Group; Seychelles Key Biodiversity Areas National Coordination Group
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1600 - Dec 31, 2022
    Area covered
    Description

    This dataset compiles most of the data from the "BIO" database for the Seychelles islands. It has three main objectives: 1.To share a nationally agreed taxonomic index of all species recorded in the country 2.To associate with that checklist key data on the conservation value of these species or their invasiveness in Seychelles. This includes a National Red List using IUCN threat statuses assessed at the National level. 3.To share all data available on the distribution of these species, including occurrences with exact coordinates except for species considered sensitive (which are provided here without exact coordinates, and for which the complete data are shared separately in a private GBIF dataset shared only with chosen conservation actors in Seychelles). In the first version, and in the short term, this dataset is restricted to plants, but in future the plan is to extend it to all taxonomic groups. In addition, because the species listed, and their taxonomic and conservation statuses need to be reviewed and discussed with the local scientific community in Seychelles, we use this dataset publication as an opportunity to strengthen partnerships in Seychelles. The pre-published first version was presented to stakeholders and discussed. We agreed to form a Key Biodiversity Areas (KBA) National Coordination Group (NCG) (see list of contacts in the section "Associated Parties"), which is the collective author of this dataset and whose participants are involved in verifying and improving the dataset. This dataset will therefore serve as an open source repository for a formal KBA review in Seychelles. The group will eventually take part to and be complemented by a GBIF National Node which is being developed simultaneously.

    This dataset is accompanied by 3 R scripts available online (https://github.com/bsenterre/seychecklist): •The first shows how the BIO database manager converted the BIO data into text files ready for upload in the IPT •The second downloads the dataset from GBIF and compile it into an enriched format that is used for a Shiny app •The third creates a Shiny app that allows users to explore the data and to verify the status of Key Biodiversity Areas based on the distribution of species triggers. The app also provides users with a nationally agreed checklist of species of the Seychelles along with their conservation value or invasion status. These scripts provide therefore a fully transparent approach to identifying KBAs, where the data is open source and where the data analysis and synthesis are also explicit and open source.

    To prepare this dataset, we have reviewed the various standards available with GBIF through the main 'cores' and extensions (http://rs.gbif.org/extension/gbif/1.0/). Based on that review, considering the content of our BIO database and differences between our taxonomic backbone and GBIF backbone, we have decided to prepare a dataset using the Taxon Core and the following extensions (i.e. a checklist based on occurrences): •Occurrence: for the core of the BIO database •Species Distribution: for biogeography aspects, of native range (endemic to what) •Species Profiles: for basic ecology (marine, freshwater, terrestrial), basic invasion ecology at species level (isInvasive) and basic functional biology. •Vernacular names (although still in development) •Alternative identifiers: to link to GBIF IDs and to IUCN Red List IDs

    Complementary data are spread over the following other datasets: •seysensitive: a private dataset providing the exact geographic coordinates for sensitive taxa (sharing their occurrenceID with the obscured duplicate found in the current seychecklist dataset) •seynotinchecklist: an occurrence dataset containing all species occurrence data from the BIO database which are not linked to a species name listed in the current seychecklist dataset (https://www.gbif.org/dataset/99ccf1cc-03e3-4bd4-8a78-50d46dee8cb7) •seyvegplot: dataset compiling vegetation plots, with eventID linking to the seychecklist dataset (https://www.gbif.org/dataset/4fc42f17-eaeb-4296-949d-34b8414eb1c1) •ecosystemology: a dataset providing an index of ecosystem types, their names and synonymies (https://www.gbif.org/dataset/f513fe98-b1c3-45ee-8e14-7f2a5b7890bf). The ID of each individual stand (ecosystem occurrence) is referred to in the seychecklist dataset using the field eventRemarks (while locationID is used to store the code for the location of the stand).

  14. m

    HUN Surface Water Model Nodes 20170110 HRV Ratios v01

    • demo.dev.magda.io
    • data.gov.au
    Updated Aug 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2023). HUN Surface Water Model Nodes 20170110 HRV Ratios v01 [Dataset]. https://demo.dev.magda.io/dataset/ds-dga-703a23f8-22f3-4fc3-be66-c1764f24a596
    Explore at:
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    Bioregional Assessment Program
    Description

    Abstract This dataset was derived by the Bioregional Assessment Programme from source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes …Show full descriptionAbstract This dataset was derived by the Bioregional Assessment Programme from source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement. The dataset consisits of three shapefiles containing point features of Hunter subregion surface water nodes with summarised hydrological response variables (HRV) data for zero flow days (ZFD), high flow days (HFD) and annual flow (AF). Maps use the ratio data for mapping with classes.The changes in the number of ZFD, HFD and AF due to additional coal resource development relative to the interannual variability flow days under the baseline has been adopted to put some context around the modelled changes. This ratio of absolute maximum change and variability range (5th, 50th and 95th percentiles) has been calculated qualitatively for each surface water model node . Dataset History These shape files bring model node spatial location and HRV ratios together. The input surface water HRV shape file (HUN_SW_Modelling_Reaches_and_HRV_lookup_20171121_v05 (GUID: 8c330d59-2ecc-4c35-8f9e-68b91a4ae98a) has been augmented with data from the spreadsheet (HUN_HRVs_scatterplots_version3.xlsx, GUID:1c0a19f9-98c2-4d92-956d-dd764aaa10f9), by linking to the node number. For each dataset, a new shapefile was produced - zero flow days (ZFD), high flow days (HFD) and annual flow (AF). The data taken from the spreadsheet was: ZFD. ZFD_corrected worksheet for ratio data with p_max data coming from ZFD worksheet. HFD. FD_corrected worksheet for both ratio and amax data AF. AF_corrected worksheet for both ratio and amax data Maps use the ratio data for mapping with classes shown on maps except where the following was true, the ratio was replaced with 'no significant change' amax for LFD <3 (i.e. 3 is a blue dot, not an open circle) amax for FD <3 (i.e. 3 is a blue dot not an open circle) pmax for AF <1 percent (note this is pmax NOT amax) Dataset Citation Bioregional Assessment Programme (XXXX) HUN Surface Water Model Nodes 20170110 HRV Ratios v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/155fdbdc-3cb1-4fb9-bb76-5380463e955c. Dataset Ancestors Derived From River Styles Spatial Layer for New South Wales Derived From SYD ALL climate data statistics summary Derived From HUN AWRA-L simulation nodes_v01 Derived From Hunter River Salinity Scheme Discharge NSW EPA 2006-2012 Derived From Geofabric Surface Network - V2.1 Derived From HUN AWRA-R simulation nodes v01 Derived From Bioregional Assessment areas v06 Derived From Hunter AWRA Hydrological Response Variables (HRV) Derived From GEODATA 9 second DEM and D8: Digital Elevation Model Version 3 and Flow Direction Grid 2008 Derived From Bioregional Assessment areas v04 Derived From Geofabric Surface Network - V2.1.1 Derived From HUN AWRA-R Gauge Station Cross Sections v01 Derived From Gippsland Project boundary Derived From Natural Resource Management (NRM) Regions 2010 Derived From BA All Regions BILO cells in subregions shapefile Derived From Hunter Surface Water data v2 20140724 Derived From HUN AWRA-R River Reaches Simulation v01 Derived From HUN SW Model nodes 20170110 Derived From HUN AWRA-L simulation nodes v02 Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb) Derived From Bioregional_Assessment_Programme_Catchment Scale Land Use of Australia - 2014 Derived From GEODATA TOPO 250K Series 3 Derived From NSW Catchment Management Authority Boundaries 20130917 Derived From Geological Provinces - Full Extent Derived From BA SYD selected GA TOPO 250K data plus added map features Derived From HUN gridded daily PET from 1973-2102 v01 Derived From HUN AWRA-R Irrigation Area Extents and Crop Types v01 Derived From Bioregional Assessment areas v03 Derived From IQQM Model Simulation Regulated Rivers NSW DPI HUN 20150615 Derived From HUN AWRA-R calibration catchments v01 Derived From Bioregional Assessment areas v05 Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012 Derived From National Surface Water sites Hydstra Derived From Selected streamflow gauges within and near the Hunter subregion Derived From HUN SW Modelling Reaches and HRV lookup 20171121 v05 Derived From ASRIS Continental-scale soil property predictions 2001 Derived From HUN Comparison of model variability and interannual variability Derived From HUN River Perenniality v01 Derived From Hunter Surface Water data extracted 20140718 Derived From Mean Annual Climate Data of Australia 1981 to 2012 Derived From HUN AWRA-R calibration nodes v01 Derived From HUN AWRA-R Observed storage volumes Glenbawn Dam and Glennies Creek Dam Derived From HUN future climate rainfall v01 Derived From HUN AWRA-LR Model v01 Derived From HUN AWRA-L ASRIS soil properties v01 Derived From HUN AWRAR restricted input 01 Derived From Bioregional Assessment areas v01 Derived From Bioregional Assessment areas v02 Derived From Victoria - Seamless Geology 2014 Derived From HUN AWRA-L Site Station Cross Sections v01 Derived From HUN AWRA-R simulation catchments v01 Derived From HUN AWRA-R Simulation Node Cross Sections v01 Derived From Climate model 0.05x0.05 cells and cell centroids

  15. d

    Namoi HRV ratios

    • data.gov.au
    • researchdata.edu.au
    • +1more
    Updated Nov 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2019). Namoi HRV ratios [Dataset]. https://data.gov.au/data/dataset/groups/f86caa9b-bc0e-4fef-a5b0-68457f19d222
    Explore at:
    Dataset updated
    Nov 20, 2019
    Dataset provided by
    Bioregional Assessment Program
    Area covered
    Namoi River
    Description

    Abstract

    This dataset was derived by the Bioregional Assessment Programme from source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    Three shapefiles containing point features of Namoi subregion surface water nodes with summarised hydrological response variables (HRV) data for zero flow days (ZFD), high flow days (HFD) and annual flow (AF).

    Purpose

    Created to facilitate NAM 3-4 product maps.

    Dataset History

    The input surface water HRV shape file (AWRAR_NAM_reaches_V2 (GUID: 433a27f1-cee8-499e-970a-607c6a25e979) has been augmented with data from the spreadsheet (NAM_HRVs_interperiod_scatter_spatial_plot_data_v3), by linking to the node number.

    For the Namoi subregion an extra step was required as the node number in the shapefile are not the same as the spreadsheet, so there was a linking table (gauge_node_relation.xlsx).

    For each dataset, a new shapefile was produced - zero flow days (ZFD), high flow days (HFD) and annual flow (AF).

    The data taken from the spreadsheet was:

    ZFD. ZFD_corrected worksheet for ratio data with p_max data coming from ZFD worksheet.

    HFD. FD_corrected worksheet for both ratio and amax data

    AF. AF_corrected worksheet for both ratio and amax data

    Maps use the ratio data for mapping with classes shown on maps except where the following was true, the ratio was replaced with 'no significant change'

    amax for LFD <3 (i.e. 3 is a blue dot, not an open circle)

    amax for FD <3 (i.e. 3 is a blue dot not an open circle)

    pmax for AF <1 percent (note this is pmax NOT amax)

    Dataset Citation

    Bioregional Assessment Programme (2017) Namoi HRV ratios. Bioregional Assessment Derived Dataset. Viewed 11 December 2018, http://data.bioregionalassessments.gov.au/dataset/f86caa9b-bc0e-4fef-a5b0-68457f19d222.

    Dataset Ancestors

  16. Dataset for "Cognitive behavioural therapy self-help intervention...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chelsea Coumoundouros; Chelsea Coumoundouros; Paul Farrand; Paul Farrand; Alexander Hamilton; Alexander Hamilton; Louise Von Essen; Robbert Sanderman; Joanne Woodford; Joanne Woodford; Louise Von Essen; Robbert Sanderman (2024). Dataset for "Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey" [Dataset]. http://doi.org/10.5281/zenodo.7104638
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chelsea Coumoundouros; Chelsea Coumoundouros; Paul Farrand; Paul Farrand; Alexander Hamilton; Alexander Hamilton; Louise Von Essen; Robbert Sanderman; Joanne Woodford; Joanne Woodford; Louise Von Essen; Robbert Sanderman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and R code used for the analysis of data for the publication: Coumoundouros et al., Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey. BMC Nephrology

    Summary of study

    An online cross-sectional survey for informal caregivers (e.g. family and friends) of people living with chronic kidney disease in the United Kingdom. Study aimed to examine informal caregivers' cognitive behavioural therapy self-help intervention preferences, and describe the caregiving situation (e.g. types of care activities) and informal caregiver's mental health (depression, anxiety and stress symptoms).

    Participants were eligible to participate if they were at least 18 years old, lived in the United Kingdom, and provided unpaid care to someone living with chronic kidney disease who was at least 18 years old.

    The online survey included questions regarding (1) informal caregiver's characteristics; (2) care recipient's characteristics; (3) intervention preferences (e.g. content, delivery format); and (4) informal caregiver's mental health. Informal caregiver's mental health was assessed using the 21 item Depression, Anxiety, and Stress Scale (DASS-21), which is composed of three subscales measuring depression, anxiety, and stress, respectively.

    Sixty-five individuals participated in the survey.

    See the published article for full study details.

    Description of uploaded files

    1. ENTWINE_ESR14_Kidney Carer Survey Data_FULL_2022-08-30: Excel file with the complete, raw survey data. Note: the first half of participant's postal codes was collected, however this data was removed from the uploaded dataset to ensure participant anonymity.

    2. ENTWINE_ESR14_Kidney Carer Survey Data_Clean DASS-21 Data_2022-08-30: Excel file with cleaned data for the DASS-21 scale. Data cleaning involved imputation of missing data if participants were missing data for one item within a subscale of the DASS-21. Missing values were imputed by finding the mean of all other items within the relevant subscale.

    3. ENTWINE_ESR14_Kidney Carer Survey_KEY_2022-08-30: Excel file with key linking item labels in uploaded datasets with the corresponding survey question.

    4. R Code for Kidney Carer Survey_2022-08-30: R file of R code used to analyse survey data.

    5. R code for Kidney Carer Survey_PDF_2022-08-30: PDF file of R code used to analyse survey data.

  17. Z

    Dataset — Make Reddit Great Again: Assessing Community Effects of Moderation...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cresci, Stefano (2023). Dataset — Make Reddit Great Again: Assessing Community Effects of Moderation Interventions on r/The_Donald [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6250576
    Explore at:
    Dataset updated
    Jan 10, 2023
    Dataset provided by
    Trujillo, Amaury
    Cresci, Stefano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reddit contents and complementary data regarding the r/The_Donald community and its main moderation interventions, used for the corresponding article indicated in the title.

    An accompanying R notebook can be found in: https://github.com/amauryt/make_reddit_great_again

    If you use this dataset please cite the related article.

    The dataset timeframe of the Reddit contents (submissions and comments) spans from 30 weeks before Quarantine (2018-11-28) to 30 weeks after Restriction (2020-09-23). The original Reddit content was collected from the Pushshift monthly data files, transformed, and loaded into two SQLite databases.

    The first database, the_donald.sqlite, contains all the available content from r/The_Donald created during the dataset timeframe, with the last content being posted several weeks before the timeframe upper limit. It only has two tables: submissions and comments. It should be noted that the IDs of contents are on base 10 (numeric integer), unlike the original base 36 (alphanumeric) used on Reddit and Pushshift. This is for efficient storage and processing. If necessary, many programming languages or libraries can easily convert IDs from one base to another.

    The second database, core_the_donald.sqlite, contains all the available content from core users of r/The_Donald made platform-wise (i.e., within and without the subreddit) during the dataset timeframe. Core users are defined as those who authored either a submission or a comment a week in r/The_Donald during the 30 weeks prior to the subreddit's Quarantine. The database has four tables: submissions, comments, subreddits, and perspective_scores. The subreddits table contains the names of the subreddits to which submissions and comments were made (their IDs are also on base 10). The perspective_scores table contains comment toxicity scores.

    The Perspective API was used to score comments based on the attributes toxicity and severe_toxicity. It should be noted that not all of the comments in core_the_donald have a score because the comment body was blank or because the Perspective API returned a request error (after three tries). However, the percentage of missing scores is minuscule.

    A third file, mbfc_scores.csv, contains the bias and factual reporting accuracy collected in October 2021 from Media Bias / Fact Check (MBFC). Both attributes are scored on a Likert-like manner. One can associate submissions to MBFC scores by doing a join by the domain column.

  18. Z

    Data from: AgrImOnIA: Open Access dataset correlating livestock and air...

    • data.niaid.nih.gov
    • data.subak.org
    • +1more
    Updated Feb 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Golini, Natalia (2024). AgrImOnIA: Open Access dataset correlating livestock and air quality in the Lombardy region, Italy [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6620529
    Explore at:
    Dataset updated
    Feb 6, 2024
    Dataset provided by
    Finazzi, Francesco
    Fusta Moro, Alessandro
    Maranzano, Paolo
    Vinciguerra, Marco
    Otto, Philipp
    Golini, Natalia
    Ignaccolo, Rosaria
    Rodeschini, Jacopo
    Fassò, Alessandro
    Shaboviq, Qendrim
    Cameletti, Michela
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Italy, Lombardy
    Description

    The AgrImOnIA dataset is a comprehensive dataset relating air quality and livestock (expressed as the density of bovines and swine bred) along with weather and other variables. The AgrImOnIA Dataset represents the first step of the AgrImOnIA project. The purpose of this dataset is to give the opportunity to assess the impact of agriculture on air quality in Lombardy through statistical techniques capable of highlighting the relationship between the livestock sector and air pollutants concentrations.

    The building process of the dataset is detailed in the companion paper:

    A. Fassò, J. Rodeschini, A. Fusta Moro, Q. Shaboviq, P. Maranzano, M. Cameletti, F. Finazzi, N. Golini, R. Ignaccolo, and P. Otto (2023). Agrimonia: a dataset on livestock, meteorology and air quality in the Lombardy region, Italy. SCIENTIFIC DATA, 1-19.

    available here.

    This dataset is a collection of estimated daily values for a range of measurements of different dimensions as: air quality, meteorology, emissions, livestock animals and land use. Data are related to Lombardy and the surrounding area for 2016-2021, inclusive. The surrounding area is obtained by applying a 0.3° buffer on Lombardy borders.

    The data uses several aggregation and interpolation methods to estimate the measurement for all days.

    The files in the record, renamed according to their version (es. .._v_3_0_0), are:

    Agrimonia_Dataset.csv(.mat and .Rdata) which is built by joining the daily time series related to the AQ, WE, EM, LI and LA variables. In order to simplify access to variables in the Agrimonia dataset, the variable name starts with the dimension of the variable, i.e., the name of the variables related to the AQ dimension start with 'AQ_'. This file is archived also in the format for MATLAB and R software.

    Metadata_Agrimonia.csv which provides further information about the Agrimonia variables: e.g. sources used, original names of the variables imported, transformations applied.

    Metadata_AQ_imputation_uncertainty.csv which contains the daily uncertainty estimate of the imputed observation for the AQ to mitigate missing data in the hourly time series.

    Metadata_LA_CORINE_labels.csv which contains the label and the description associated with the CLC class.

    Metadata_monitoring_network_registry.csv which contains all details about the AQ monitoring station used to build the dataset. Information about air quality monitoring stations include: station type, municipality code, environment type, altitude, pollutants sampled and other. Each row represents a single sensor.

    Metadata_LA_SIARL_labels.csv which contains the label and the description associated with the SIARL class.

    AGC_Dataset.csv(.mat and .Rdata) that includes daily data of almost all variables available in the Agrimonia Dataset (excluding AQ variables) on an equidistant grid covering the Lombardy region and its surrounding area.

    The Agrimonia dataset can be reproduced using the code available at the GitHub page: https://github.com/AgrImOnIA-project/AgrImOnIA_Data

    UPDATE 31/05/2023 - NEW RELEASE - V 3.0.0

    A new version of the dataset is released: Agrimonia_Dataset_v_3_0_0.csv (.Rdata and .mat), where variable WE_rh_min, WE_rh_mean and WE_rh_max have been recomputed due to some bugs.

    In addition, two new columns are added, they are LI_pigs_v2 and LI_bovine_v2 and represents the density of the pigs and bovine (expressed as animals per kilometer squared) of a square of size ~ 10 x 10 km centered at the station localisation.

    A new dataset is released: the Agrimonia Grid Covariates (AGC) that includes daily information for the period from 2016 to 2020 of almost all variables within the Agrimonia Dataset on a equidistant grid containing the Lombardy region and its surrounding area. The AGC does not include AQ variables as they come from the monitoring stations that are irregularly spread over the area considered.

    UPDATE 11/03/2023 - NEW RELEASE - V 2.0.2

    A new version of the dataset is released: Agrimonia_Dataset_v_2_0_2.csv (.Rdata), where variable WE_tot_precipitation have been recomputed due to some bugs.

    A new version of the metadata is available: Metadata_Agrimonia_v_2_0_2.csv where the spatial resolution of the variable WE_precipitation_t is corrected.

    UPDATE 24/01/2023 - NEW RELEASE - V 2.0.1

    minor bug fixed

    UPDATE 16/01/2023 - NEW RELEASE - V 2.0.0

    A new version of the dataset is released, Agrimonia_Dataset_v_2_0_0.csv (.Rdata) and Metadata_monitoring_network_registry_v_2_0_0.csv. Some minor points have been addressed:

    Added values for LA_land_use variable for Switzerland stations (in Agrimonia Dataset_v_2_0_0.csv)

    Deleted incorrect values for LA_soil_use variable for stations outside Lombardy region during 2018 (in Agrimonia Dataset_v_2_0_0.csv)

    Fixed duplicate sensors corresponding to the same pollutant within the same station (in Metadata_monitoring_network_registry_v_2_0_0.csv)

  19. r

    Water Modelling-Modelled Data-Regional Water Strategy

    • researchdata.edu.au
    • data.nsw.gov.au
    • +1more
    Updated Dec 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nsw.gov.au (2024). Water Modelling-Modelled Data-Regional Water Strategy [Dataset]. https://researchdata.edu.au/water-modelling-modelled-water-strategy/3441210
    Explore at:
    Dataset updated
    Dec 16, 2024
    Dataset provided by
    data.nsw.gov.au
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Area covered
    Description

    The datasets provided contain modelled daily streamflow, and storage volume data for several NSW river systems. These data were generated by simulating baseline river system models used to inform the development of Regional Water Strategies. The models were simulated for three different climate scenarios: instrumental climate (about 130 years), paleo-stochastic climate (about 10,000 years), and paleo-stochastic climate with climate projection based on NARCliM 1.0 (about 10,000 years). \r \r Each modelled output is published as a ZIP file which contains two pdf files (.pdf) and three time series data (.csv).\r \r Note: To access and download datasets for specific regions, such as the Lachlan river system, please navigate to the respective child assets beneath this parent asset.\r \r For more information on the NSW regional water strategies program, please refer to the following website. https://www.dpie.nsw.gov.au/water/our-work/plans-and-strategies/regional-water-strategies\r \r \r -----------------------------------\r \r Note: If you would like to ask a question, make any suggestions, or tell us how you are using this dataset, please visit the NSW Water Hub which has an online forum you can join.\r \r

  20. Maggie Creek Water Quality Data for Ecological Proper Functioning Condition...

    • datasets.ai
    • catalog.data.gov
    53
    Updated Sep 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency (2024). Maggie Creek Water Quality Data for Ecological Proper Functioning Condition Analysis [Dataset]. https://datasets.ai/datasets/maggie-creek-water-quality-data-for-ecological-proper-functioning-condition-analysis
    Explore at:
    53Available download formats
    Dataset updated
    Sep 13, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Authors
    U.S. Environmental Protection Agency
    Description

    These data are "standard" water quality parameters collected for surface water condition analysis (for example pH, conductivity, DO, TSS).

    This dataset is associated with the following publication: Kozlowski, D., R. Hall , S. Swanson, and D. Heggem. Linking Management and Riparian Physical Functions to Water Quality and Aquatic Habitat. JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT. American Society of Civil Engineers (ASCE), Reston, VA, USA, 8(8): 797-815, (2016).

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2021). Spatio-temporal linking of multiple SAR satellite data from medium and high resolution Radarsat-2 images - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/74b90d30-c365-5648-a3bd-1d3fed53db7a

Spatio-temporal linking of multiple SAR satellite data from medium and high resolution Radarsat-2 images - Dataset - B2FIND

Explore at:
Dataset updated
Sep 21, 2021
Description

A recent development in Interferometric Synthetic Aperture Radar (InSAR) technology is integrating multiple SAR satellite data to dynamically extract ground features. This paper addresses two relevant challenges: identification of common ground targets from different SAR datasets in space, and concatenation of time series when dealing with temporal dynamics. To address the first challenge, we describe the geolocation uncertainty of InSAR measurements as a three-dimensional error ellipsoid. The points, among InSAR measurements, which have error ellipsoids with a positive cross volume are identified as tie-point pairs representing common ground objects from multiple SAR datasets. The cross volumes are calculated using Monte Carlo methods and serve as weights to achieve the equivalent deformation time series. To address the second challenge, the deformation time series model for each tie-point pair is estimated using probabilistic methods, where potential deformation models are efficiently tested and evaluated. As an application, we integrated two Radarsat-2 datasets in Standard and Extra-Fine modes to map the subsidence of the west of the Netherlands between 2010 and 2017. We identified 18128 tie-point pairs, 5 intersection types of error ellipsoids, 5 deformation models, and constructed their long-term deformation time series. The detected maximum mean subsidence velocity in Line-Of-Sight direction is up to 15mm/yr. We conclude that our method removes limitations that exist in single-viewing-geometry SAR when integrating multiple SAR data. In particular, the proposed time-series modeling method is useful to achieve a long-term deformation time series of multiple datasets.

Search
Clear search
Close search
Google apps
Main menu