88 datasets found

e
Merger of BNV-D data (2008 to 2019) and enrichment
data.europa.eu
zip
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick VINCOURT (2025). Merger of BNV-D data (2008 to 2019) and enrichment [Dataset]. https://data.europa.eu/data/datasets/5f1c3eca9d149439e50c740f
Explore at:
zip(18530465)Available download formats
Dataset updated
Jan 16, 2025
Dataset authored and provided by
Patrick VINCOURT
Description
Merging (in Table R) data published on https://www.data.gouv.fr/fr/datasets/ventes-de-pesticides-par-departement/, and joining two other sources of information associated with MAs: — uses: https://www.data.gouv.fr/fr/datasets/usages-des-produits-phytosanitaires/ — information on the “Biocontrol” status of the product, from document DGAL/SDQSPV/2020-784 published on 18/12/2020 at https://agriculture.gouv.fr/quest-ce-que-le-biocontrole

All the initial files (.csv transformed into.txt), the R code used to merge data and different output files are collected in a zip. enter image description here NB: 1) “YASCUB” for {year,AMM,Substance_active,Classification,Usage,Statut_“BioConttrol”}, substances not on the DGAL/SDQSPV list being coded NA. 2) The file of biocontrol products shall be cleaned from the duplicates generated by the marketing authorisations leading to several trade names.
3) The BNVD_BioC_DY3 table and the output file BNVD_BioC_DY3.txt contain the fields {Code_Region,Region,Dept,Code_Dept,Anne,Usage,Classification,Type_BioC,Quantite_substance)}
d
Replication Data for: \"A Topic-based Segmentation Model for Identifying...
search.dataone.org
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert (2024). Replication Data for: \"A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews\" [Dataset]. http://doi.org/10.7910/DVN/EE3DE2
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/EE3DE2
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Kim, Sunghoon; Lee, Sanghak; McCulloch, Robert
Description
We provide instructions, codes and datasets for replicating the article by Kim, Lee and McCulloch (2024), "A Topic-based Segmentation Model for Identifying Segment-Level Drivers of Star Ratings from Unstructured Text Reviews." This repository provides a user-friendly R package for any researchers or practitioners to apply A Topic-based Segmentation Model with Unstructured Texts (latent class regression with group variable selection) to their datasets. First, we provide a R code to replicate the illustrative simulation study: see file 1. Second, we provide the user-friendly R package with a very simple example code to help apply the model to real-world datasets: see file 2, Package_MixtureRegression_GroupVariableSelection.R and Dendrogram.R. Third, we provide a set of codes and instructions to replicate the empirical studies of customer-level segmentation and restaurant-level segmentation with Yelp reviews data: see files 3-a, 3-b, 4-a, 4-b. Note, due to the dataset terms of use by Yelp and the restriction of data size, we provide the link to download the same Yelp datasets (https://www.kaggle.com/datasets/yelp-dataset/yelp-dataset/versions/6). Fourth, we provided a set of codes and datasets to replicate the empirical study with professor ratings reviews data: see file 5. Please see more details in the description text and comments of each file. [A guide on how to use the code to reproduce each study in the paper] 1. Full codes for replicating Illustrative simulation study.txt -- [see Table 2 and Figure 2 in main text]: This is R source code to replicate the illustrative simulation study. Please run from the beginning to the end in R. In addition to estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships, you will get dendrograms of selected groups of variables in Figure 2. Computing time is approximately 20 to 30 minutes 3-a. Preprocessing raw Yelp Reviews for Customer-level Segmentation.txt: Code for preprocessing the downloaded unstructured Yelp review data and preparing DV and IVs matrix for customer-level segmentation study. 3-b. Instruction for replicating Customer-level Segmentation analysis.txt -- [see Table 10 in main text; Tables F-1, F-2, and F-3 and Figure F-1 in Web Appendix]: Code for replicating customer-level segmentation study with Yelp data. You will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 3 to 4 hours. 4-a. Preprocessing raw Yelp reviews_Restaruant Segmentation (1).txt: R code for preprocessing the downloaded unstructured Yelp data and preparing DV and IVs matrix for restaurant-level segmentation study. 4-b. Instructions for replicating restaurant-level segmentation analysis.txt -- [see Tables 5, 6 and 7 in main text; Tables E-4 and E-5 and Figure H-1 in Web Appendix]: Code for replicating restaurant-level segmentation study with Yelp. you will get estimated coefficients (posterior means of coefficients), indicators of variable selections, and segment memberships. Computing time is approximately 10 to 12 hours. [Guidelines for running Benchmark models in Table 6] Unsupervised Topic model: 'topicmodels' package in R -- after determining the number of topics(e.g., with 'ldatuning' R package), run 'LDA' function in the 'topicmodels'package. Then, compute topic probabilities per restaurant (with 'posterior' function in the package) which can be used as predictors. Then, conduct prediction with regression Hierarchical topic model (HDP): 'gensimr' R package -- 'model_hdp' function for identifying topics in the package (see https://radimrehurek.com/gensim/models/hdpmodel.html or https://gensimr.news-r.org/). Supervised topic model: 'lda' R package -- 'slda.em' function for training and 'slda.predict' for prediction. Aggregate regression: 'lm' default function in R. Latent class regression without variable selection: 'flexmix' function in 'flexmix' R package. Run flexmix with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, conduct prediction of dependent variable per each segment. Latent class regression with variable selection: 'Unconstraind_Bayes_Mixture' function in Kim, Fong and DeSarbo(2012)'s package. Run the Kim et al's model (2012) with a certain number of segments (e.g., 3 segments in this study). Then, with estimated coefficients and memberships, we can do prediction of dependent variables per each segment. The same R package ('KimFongDeSarbo2012.zip') can be downloaded at: https://sites.google.com/scarletmail.rutgers.edu/r-code-packages/home 5. Instructions for replicating Professor ratings review study.txt -- [see Tables G-1, G-2, G-4 and G-5, and Figures G-1 and H-2 in Web Appendix]: Code to replicate the Professor ratings reviews study. Computing time is approximately 10 hours. [A list of the versions of R, packages, and computer...

Harmonized global datasets of soil carbon and heterotrophic respiration from...

zenodo.org

bin, nc

Updated Jun 28, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Shoji Hashimoto; Shoji Hashimoto; Akihiko Ito; Akihiko Ito; Kazuya Nishina; Kazuya Nishina (2025). Harmonized global datasets of soil carbon and heterotrophic respiration from data-driven estimates, with derived turnover time and Q10 [Dataset]. http://doi.org/10.5281/zenodo.15110783

Explore at:

nc, binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15110783

Dataset updated

Jun 28, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Shoji Hashimoto; Shoji Hashimoto; Akihiko Ito; Akihiko Ito; Kazuya Nishina; Kazuya Nishina

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We collected all available global soil carbon (C) and heterotrophic respiration (R_H) maps derived from data-driven estimates, sourcing them from public repositories and supplementary materials of previous studies (Table 1). All spatial datasets were converted to NetCDF format for consistency and ease of use.

Because the maps had varying spatial resolutions (ranging from 0.0083° to 0.5°), we harmonized all datasets to a common resolution of 0.5° (approximately 50 km at the equator). We then merged the processed maps by computing the mean, maximum, and minimum values at each grid cell, resulting in harmonized global maps of soil C (for the top 0–30 cm and 0–100 cm depths) and R_H at 0.5° resolution.

Grid cells with fewer than three soil C estimates or fewer than four R_H estimates were assigned NA values. Land and water grid cells were automatically distinguished by combining multiple datasets containing soil C and R_H information over land.

Soil carbon turnover time (years), denoted as τ, was calculated under the assumption of a quasi-equilibrium state using the formula:

τ = C_S / R_H

where C_S is soil carbon stock and R_H is the heterotrophic respiration rate. The uncertainty range of τ was estimated for each grid cell using:

τ_max = C_S⁺ / R_H⁻ τ_min = C_S⁻ / R_H⁺

where C_S⁺ and C_S⁻ are the maximum and minimum soil C values, and R_H⁺ and R_H⁻ are the maximum and minimum R_H values, respectively.

To calculate the temperature sensitivity of decomposition (Q₁₀)—the factor by which decomposition rates increase with a 10 °C rise in temperature—we followed the method described in Koven et al. (2017). The uncertainty of Q₁₀ (maximum and minimum values) was derived using τ_max and τ_min, respectively.

More details are provided in:

Shoji Hashimoto, Akihiko Ito, Kazuya Nishina (submitted)

Reference

Koven, C. D., Hugelius, G., Lawrence, D. M. & Wieder, W. R. Higher climatological temperature sensitivity of soil carbon in cold than warm climates. Nat. Clim. Change 7, 817–822 (2017).

Table1 : List of soil carbon and heterotrophic respiration datasets used in this study.

Dataset	Repository/References (Dataset name)	Depth	ID in NetCDF file***
Global soil C	Global soil data task 2000 (IGBP-DIS)¹	0–100	3,-
	Shangguan et al. 2014 (GSDE)^2,3	0–100, 0–30*	1,1
	Batjes 2016 (WISE30sec)^4,5	0–100, 0–30	6,7
	Sanderman et al. 2017 (Soil-Carbon-Debt) ^6,7	0–100, 0–30	5,5
	Soilgrids team and Hengl et al. 2017 (SoilGrids)^8,9	0–30**	-,6
	Hengl and Wheeler 2018 (LandGIS)¹⁰	0–100, 0–30	4,4
	FAO 2022 (GSOC)¹¹	0–30	-,2
	FAO 2023 (HWSD2)¹²	0–100, 0–30	2,3
Circumpolar soil C	Hugelius et al. 2013 (NCSCD)^13–15	0–100, 0–30	7,8
Global R_H	Hashimoto et al. 2015^16,17	-	1
	Warner et al. 2019 (Bond-Lamberty equation based)^18,19	-	2
	Warner et al. 2019 (Subke equation based)^18,19	-	3
	Tang et al. 2020^20,21	-	4
	Lu et al. 2021^22,23	-	5
	Stell et al. 2021^24,25	-	6
	Yao et al. 2021^26,27	-	7
	He et al. 2022^28,29	-	8

*The vertical depth intervals did not exactly match 100 cm and 30 cm. Therefore, weighted means were calculated for the 0–100 cm and 0–30 cm depths. **Only the soil C stock data for the 0–30 cm depth is officially provided in the repository. ***IDs for 0–100cm/0–30cm

References

1. Global soil data task. Global Gridded Surfaces of Selected Soil Characteristics (IGBP-DIS). Preprint at https://doi.org/10.3334/ORNLDAAC/569 (2000).

2. Shangguan, W., Dai, Y., Duan, Q., Liu, B. & Yuan, H. A global soil data set for earth system modeling. J. Adv. Model. Earth Syst. 6, 249–263 (2014).

3. Land-atmosphere interaction research group at Sun Yat-sen University. The global soil dataset for Earth system modeling. http://globalchange.bnu.edu.cn/research/soilw (2014).

4. Batjes, N. H. Harmonized soil property values for broad-scale modelling (WISE30sec) with estimates of global soil carbon stocks. Geoderma 269, 61–68 (2016).

5. ISRIC World Soil Information. WISE derived soil properties on a 30 by 30 arc-seconds global grid. https://data.isric.org/geonetwork/srv/eng/catalog.search#/metadata/dc7b283a-8f19-45e1-aaed-e9bd515119bc (2016).

6. Sanderman, J., Hengl, T. & Fiske, G. J. Soil carbon debt of 12,000 years of human land use. Proc. Natl. Acad. Sci. 114, 9575–9580 (2017).

7. Sanderman, J. Soil-Carbon-Debt. https://github.com/whrc/Soil-Carbon-Debt (2017).

8. SoilGrids team. SoilGrids-global gridded soil information. https://files.isric.org/soilgrids/latest/data_aggregated/ (2020).

9. Hengl, T. et al. SoilGrids250m: Global gridded soil information based on machine learning. PLOS ONE 12, e0169748 (2017).

10. Hengl, T. & Wheeler, I. Soil organic carbon stock in kg/m² for 5 standard depth intervals (0–10, 10–30, 30–60, 60–100 and 100–200 cm) at 250 m resolution. Zenodo https://doi.org/10.5281/ZENODO.2536040 (2018).

11. FAO. Global soil organic carbon map. https://data.apps.fao.org/catalog/dataset/global-soil-organic-carbon-map (2022).

12. FAO. Harmonized world soil database v2.0. https://www.fao.org/soils-portal/data-hub/soil-maps-and-databases/harmonized-world-soil-database-v20/en/ (2023).

13. Hugelius, G. et al. A new data set for estimating organic carbon storage to 3 m depth in soils of the northern circumpolar permafrost region. Earth Syst. Sci. Data 5, 393–402 (2013).

14.

Data from: A Machine Learning Model to Estimate Toxicokinetic Half-Lives of...
datasets.ai
s.cnmilf.com
+1more
0, 33, 8
Updated Apr 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2023). A Machine Learning Model to Estimate Toxicokinetic Half-Lives of Per- and Polyfluoro-Alkyl Substances (PFAS) in Multiple Species [Dataset]. https://datasets.ai/datasets/a-machine-learning-model-to-estimate-toxicokinetic-half-lives-of-per-and-polyfluoro-alkyl-
Explore at:
33, 8, 0Available download formats
Dataset updated
Apr 30, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Authors
U.S. Environmental Protection Agency
Description
Data and code for "Dawson, D.E.; Lau, C.; Pradeep, P.; Sayre, R.R.; Judson, R.S.; Tornero-Velez, R.; Wambaugh, J.F. A Machine Learning Model to Estimate Toxicokinetic Half-Lives of Per- and Polyfluoro-Alkyl Substances (PFAS) in Multiple Species. Toxics 2023, 11, 98. https://doi.org/10.3390/toxics11020098"

Includes a link to R-markdown file allowing the application of the model to novel chemicals.

This dataset is associated with the following publication: Dawson, D., C. Lau, P. Pradeep, R. Sayre, R. Judson, R. Tornero-Velez, and J. Wambaugh. A Machine Learning Model to Estimate Toxicokinetic Half-Lives of Per- and Polyfluoro-Alkyl Substances (PFAS) in Multiple Species. Toxics. MDPI, Basel, SWITZERLAND, 11(2): 98, (2023).
f
Data from: HOW TO PERFORM A META-ANALYSIS: A PRACTICAL STEP-BY-STEP GUIDE...
scielo.figshare.com
datasetcatalog.nlm.nih.gov
tiff
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diego Ariel de Lima; Camilo Partezani Helito; Lana Lacerda de Lima; Renata Clazzer; Romeu Krause Gonçalves; Olavo Pires de Camargo (2023). HOW TO PERFORM A META-ANALYSIS: A PRACTICAL STEP-BY-STEP GUIDE USING R SOFTWARE AND RSTUDIO [Dataset]. http://doi.org/10.6084/m9.figshare.19899537.v1
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19899537.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
Diego Ariel de Lima; Camilo Partezani Helito; Lana Lacerda de Lima; Renata Clazzer; Romeu Krause Gonçalves; Olavo Pires de Camargo
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
ABSTRACT Meta-analysis is an adequate statistical technique to combine results from different studies, and its use has been growing in the medical field. Thus, not only knowing how to interpret meta-analysis, but also knowing how to perform one, is fundamental today. Therefore, the objective of this article is to present the basic concepts and serve as a guide for conducting a meta-analysis using R and RStudio software. For this, the reader has access to the basic commands in the R and RStudio software, necessary for conducting a meta-analysis. The advantage of R is that it is a free software. For a better understanding of the commands, two examples were presented in a practical way, in addition to revising some basic concepts of this statistical technique. It is assumed that the data necessary for the meta-analysis has already been collected, that is, the description of methodologies for systematic review is not a discussed subject. Finally, it is worth remembering that there are many other techniques used in meta-analyses that were not addressed in this work. However, with the two examples used, the article already enables the reader to proceed with good and robust meta-analyses. Level of Evidence V, Expert Opinion.
f
Datasets and R script
figshare.com
txt
Updated Jul 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paula Eterovick (2024). Datasets and R script [Dataset]. http://doi.org/10.6084/m9.figshare.25234633.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25234633.v1
Dataset updated
Jul 25, 2024
Dataset provided by
figshare
Authors
Paula Eterovick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A model ectotherm organism (larvae of the European Common Frog) was exposed to two ubiquitous stressors - elevated temperatures resulting from global warming and nitrate pollution - in a crossed experimental design. Health biomarkers were associated to underlying changes in larvae gut bacteria. Bacteria composition and predicted metabolic pathways corroborated faster developed at higher temperatures and reduced body condition under nitrate pollution. Microbiome adjustments seem to have buffered damages to larvae health to a certain extent. However, the highest levels of both stressors culminated in reduced body condition and hampered ability to accelerate development to escape a stressful environment.
Datasets and R script to replicate the theoretical modeling from the article...
zenodo.org
bin, txt
Updated Feb 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
vantaux amelie; vantaux amelie (2023). Datasets and R script to replicate the theoretical modeling from the article entitled Multiple hosts, multiple impacts: the role of vertebrate host diversity in shaping mosquito life history and pathogen transmission [Dataset]. http://doi.org/10.5281/zenodo.7645483
Explore at:
txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7645483
Dataset updated
Feb 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
vantaux amelie; vantaux amelie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets and R script to replicate the theoretical modeling from the article entitled " Multiple hosts, multiple impacts: the role of vertebrate host diversity in shaping mosquito life history and pathogen transmission" by Vantaux A., Moiroux N., Dabire K. R., Cohuet A., Lefevre T. 2023
DIAMAS survey on Institutional Publishing - aggregated data
zenodo.org
data.niaid.nih.gov
+1more
bin, csv, zip
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bianca Kramer; Bianca Kramer; George Ross; George Ross (2025). DIAMAS survey on Institutional Publishing - aggregated data [Dataset]. http://doi.org/10.5281/zenodo.10590503
Explore at:
csv, bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10590503
Dataset updated
Mar 13, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bianca Kramer; Bianca Kramer; George Ross; George Ross
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The DIAMAS project investigates Institutional Publishing Service Providers (IPSP) in the broadest sense, with a special focus on those publishing initiatives that do not charge fees to authors or readers. To collect information on Institutional Publishing in the ERA, a survey was conducted among IPSPs between March-May 2024. This dataset contains aggregated data from the 685 valid responses to the DIAMAS survey on Institutional Publishing.

The dataset supplements D2.3 Final IPSP landscape Report Institutional Publishing in the ERA: results from the DIAMAS survey.

The data

Basic aggregate tabular data

Full individual survey responses are not being shared to prevent the easy identification of respondents (in line with conditions set out in the survey questionnaire). This dataset contains full tables with aggregate data for all questions from the survey, with the exception of free-text responses, from all 685 survey respondents. This includes, per question, overall totals and percentages for the answers given as well the breakdown by both IPSP-types: institutional publishers (IPs) and service providers (SPs). Tables at country level have not been shared, as cell values often turned out to be too low to prevent potential identification of respondents. The data is available in csv and docx formats, with csv files grouped and packaged into ZIP files. Metadata describing data type, question type, as well as question response rate, is available in csv format. The R code used to generate the aggregate tables is made available as well.

Files included in this dataset

survey_questions_data_description.csv - metadata describing data type, question type, as well as question response rate per survey question.

tables_raw_all.zip - raw tables (csv format) with aggregated data per question for all respondents, with the exception of free-text responses. Questions with multiple answers have a table for each answer option. Zip file contains 180 csv files.

tables_raw_IP.zip - as tables_raw_all.zip, for responses from institutional publishers (IP) only. Zip file contains 180 csv files.

tables_raw_SP.zip - as tables_raw_all.zip, for responses from service providers (SP) only. Zip file contains 170 csv files.

tables_formatted_all.docx - formatted tables (docx format) with aggregated data per question for all respondents, with the exception of free-text responses. Questions with multiple answers have a table for each answer option.

tables_formatted_IP.docx - as tables_formatted_all.docx, for responses from institutional publishers (IP) only.

tables_formatted_SP.docx - as tables_formatted_all.docx, for responses from service providers (SP) only.

DIAMAS_Tables_single.R - R script used to generate raw tables with aggregated data for all single response questions

DIAMAS_Tables_multiple.R - R script used to generate raw tables with aggregated data for all multiple response questions

DIAMAS_Tables_layout.R - R script used to generate document with formatted tables from raw tables with aggregated data

DIAMAS Survey on Instititutional Publishing - data availability statement (pdf)

All data are made available under a CC0 license.
d
Data from: Data release for Linking land and sea through an...
catalog.data.gov
datasets.ai
+1more
Updated Oct 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Data release for Linking land and sea through an ecological-economic model of coral reef recreation [Dataset]. https://catalog.data.gov/dataset/data-release-for-linking-land-and-sea-through-an-ecological-economic-model-of-coral-reef-r
Explore at:
Dataset updated
Oct 1, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Coastal zones are popular recreational areas that substantially contribute to social welfare. Managers can use information about specific environmental features that people appreciate, and how these might change under different management scenarios, to spatially target actions to areas of high current or potential value. We explored how snorkelers’ experience would be affected by separate and combined land and marine management actions in West Maui, Hawaiʻi, using a Bayesian Belief Network (BBN) and a spatially explicit ecosystem services model. The BBN simulates recreational attractiveness by combining snorkelers’ preferences for coastal features with experts’ opinions on ecological dynamics, snorkeler behavior, and management actions. A choice experiment with snorkelers elucidated their preferences for sites with better ecological and water-quality conditions. Linking the economic elicitation to the spatially explicit BBN to evaluate land-sea management scenarios provides specific guidance on where and how to act in West Maui to maximize ecosystem service returns. Improving coastal water quality through sediment runoff and cesspool effluent reductions, and enhancing coral reef ecosystem conditions, positively affected overall snorkeling attractiveness across the study area, but with differential results at specific sites. The highest improvements were attained through joint land-sea management, driven by strong effects of efforts to increase fish abundance and reduce sediment, however, management priorities at individual beaches varied.
d
Data and R code for: Linking climate variability to demography in...
dataone.org
datasetcatalog.nlm.nih.gov
+2more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jack Thorley; Chris Duncan; Marta Manser; Tim Clutton-Brock (2025). Data and R code for: Linking climate variability to demography in cooperatively breeding meerkats [Dataset]. http://doi.org/10.5061/dryad.2ngf1vj11
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.2ngf1vj11
Dataset updated
Apr 11, 2025
Dataset provided by
Dryad Digital Repository
Authors
Jack Thorley; Chris Duncan; Marta Manser; Tim Clutton-Brock
Description
This dryad repository contains data sets and R scripts for our Ecological Monographs (2025) paper: Linking climate variability to demography in cooperatively breeding meerkats. The aim of our study was to develop a mechanistic understanding of how variation in rainfall and temperature affect meerkat demography. Specifically, we:

Identified the critical period during which rainfall influences vegetation productivity at the field site. Investigated how vegetation productivity and temperature influence meerkat foraging performance. Examined how these climatic variables impact daily weight fluctuations and overall body condition. Analyzed conditions under which meerkats fail to accumulate enough mass during foraging to offset overnight losses. Explored the influence of body condition on reproduction and survival. Projected future temperature trends and their implications for meerkat populations up to the mid and end of the century.

To achieve these aims, our study included numerous data s..., The data was collected as part of a long-term study of meerkats in the southern Kalahari Desert. All methods of data collection are outlined in the paper and are also summarised in Clutton-Brock and Manser (2016). The project's data is stored on a centralised SQL database and was processed for analyses by the lead author using R.
All of the analyses are tied, directly or indirectly, to the effects of climate and/or vegetation productivity. Climate data included daily rainfall measured on-site, and temperature data taken from the NOAA Climate Prediction Centre. We also extracted future temperatures from the Coupled Model Intercomparison Project 6 (CMIP6) for two future pathways: a medium-risk scenario (SSP2-4.5) and a high-risk scenario (SSP5-8.5), which we obtained from the Copernicus Climate Change service (https://cds.climate.copernicus.eu/). We selected the EC-EARTH3-CC model from CMIP6. Vegetation productivity was assessed using NDVI from the MODIS MOD13Q1 product the GI..., # Data and R code for: Linking climate variability to demography in cooperatively breeding meerkats

ðŸ“ Thorley_2025_EcologicalMonographs.zip

Dataset DOI: 10.5061/dryad.2ngf1vj11

Each analysis is provided in a separate R script, with the data sets needed for each script explained in detail.

(1) Part1_RainfallNDVI_Climwin.R

- **MeerkatNDVIMasked.csvÂ **Average NDVI across the ranges of all meerkat groups (MODIS MOD13Q1). - date â€“ dd/mm/yyyy - mean_ndvi_masked â€“ NDVI is provided every 16 days

- **MeerkatRainReserve.csvÂ **Daily rainfall at the field site. - date â€“ dd/mm/yyyy - rain_method â€“ Indicates how the daily value was collected (Manual / NOAA / Station) - rain_reserve â€“ Total daily rainfall (mm)

(2) Part2_ForagingPerformance.R

- **MeerkatForagingData.csvÂ **Historical data set of focal observations (1996â€“2001). - IndividID â€“ Individual identity - GroupRef â€“ Group identity - Focal_ID â€“ Unique ident...,
c
Insider Threat Test Dataset
kilthub.cmu.edu
txt
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Lindauer (2023). Insider Threat Test Dataset [Dataset]. http://doi.org/10.1184/R1/12841247.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1184/R1/12841247.v1
Dataset updated
May 30, 2023
Dataset provided by
Carnegie Mellon University
Authors
Brian Lindauer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Insider Threat Test Dataset is a collection of synthetic insider threat test datasets that provide both background and malicious actor synthetic data.The CERT Division, in partnership with ExactData, LLC, and under sponsorship from DARPA I2O, generated a collection of synthetic insider threat test datasets. These datasets provide both synthetic background data and data from synthetic malicious actors.For more background on this data, please see the paper, Bridging the Gap: A Pragmatic Approach to Generating Insider Threat Data.Datasets are organized according to the data generator release that created them. Most releases include multiple datasets (e.g., r3.1 and r3.2). Generally, later releases include a superset of the data generation functionality of earlier releases. Each dataset file contains a readme file that provides detailed notes about the features of that release.The answer key file answers.tar.bz2 contains the details of the malicious activity included in each dataset, including descriptions of the scenarios enacted and the identifiers of the synthetic users involved.
r
On-street Parking Bays
researchdata.edu.au
data.melbourne.vic.gov.au
Updated Mar 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.vic.gov.au (2023). On-street Parking Bays [Dataset]. https://researchdata.edu.au/on-street-parking-bays/2296305
Explore at:
Dataset updated
Mar 7, 2023
Dataset provided by
data.vic.gov.au
Description
Upcoming Changes: Please note that our parking system is being improved and this dataset may be disrupted. See more information here.\r
\r
This dataset contains spatial polygons which represent parking bays across the city. Each bay can also link to it's parking meter, and parking sensor information.\r
\r
How the data joins:\r
\r
There are three datasets that make up the live parking sensor release. They are the on-street parking bay sensors, on-street parking bays and the on-street car park bay information. \r
The way the datasets join is as follows. The on-street parking bay sensors join to the on-street parking bays by the marker_id attribute. The on-street parking bay sensors join to the on-street car park bay restrictions by the bay_id attribute. The on-street parking bays and the on-street car park bay information don’t currently join.\r
\r
\r
\r
Please see City of Melbourne's disclaimer regarding the use of this data. https://data.melbourne.vic.gov.au/stories/s/94s9-uahn
Meta Kaggle Code
kaggle.com
zip
Updated Oct 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(162197891554 bytes)Available download formats
Dataset updated
Oct 23, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Benchmark Datasets for Entity Linking from Tabular Data
zenodo.org
zip
Updated Sep 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberto Avogadro; Roberto Avogadro (2025). Benchmark Datasets for Entity Linking from Tabular Data [Dataset]. http://doi.org/10.5281/zenodo.17160156
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17160156
Dataset updated
Sep 19, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Roberto Avogadro; Roberto Avogadro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
📖 Benchmark Datasets for Entity Linking from Tabular Data (Version 2)

This archive provides a benchmark suite for evaluating entity linking algorithms on structured tabular data.
It is organised into two parts:
• Challenge datasets (HTR1, HTR2): From the SemTab Table-to-KG Challenge, widely used in academic evaluations of table-to-KG alignment systems. Each is a dataset (a collection of many tables) provided with ground truth and candidate mappings.
👉 Please also cite the SemTab Challenge when using these resources.
• Real-world tables (Company, Movie, SN):
• Company — one table constructed via SPARQL queries on Wikidata, with both Wikidata and Crunchbase ground truths.
• Movie — one table constructed via SPARQL queries on Wikidata.
• SN (Spend Network) — one procurement table from the enRichMyData (EMD) project, manually annotated and including NIL cases for mentions with no known Wikidata match.

A shared top-level folder (mention_to_qid/) provides JSON files mapping surface mentions to candidate QIDs for these real-world tables.

⸻

📂 Contents

Each dataset or table includes:
• One or more input CSV tables
• Ground truth files mapping mentions/cells to Wikidata QIDs (or NIL)
• Candidate mappings (mention_to_qid/*.json), sometimes multiple variants
• Optional files such as column_classifications.json or cell_to_qid.json

⸻

📝 Licensing
• HTR1 & HTR2: CC BY 4.0
• Company & Movie: Derived from Wikidata (public domain; CC0 1.0)
• SN: CC BY 4.0 (from the enRichMyData project)

⸻

📌 Citation

If you use these datasets, please cite:
• This Zenodo record (Version 2):
Avogadro, R., & Rauniyar, A. (2025). Benchmark Datasets for Entity Linking from Tabular Data (Version 2). Zenodo. https://doi.org/10.5281/zenodo.15888942
• The SemTab Challenge (for HTR1/HTR2):
SemTab: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (Table-to-KG). (Cite the relevant SemTab overview paper for the year you reference.)
• Wikidata: Data retrieved from Wikidata (public domain; CC0 1.0).
• enRichMyData (for SN / Spend Network): Project resources from enRichMyData, licensed under CC BY 4.0.
d
Data from: SBIR - STTR Data and Code for Collecting Wrangling and Using It
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allard, Grant (2023). SBIR - STTR Data and Code for Collecting Wrangling and Using It [Dataset]. http://doi.org/10.7910/DVN/CKTAZX
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/CKTAZX
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Allard, Grant
Description
Data set consisting of data joined for analyzing the SBIR/STTR program. Data consists of individual awards and agency-level observations. The R and python code required for pulling, cleaning, and creating useful data sets has been included. Allard_Get and Clean Data.R This file provides the code for getting, cleaning, and joining the numerous data sets that this project combined. This code is written in the R language and can be used in any R environment running R 3.5.1 or higher. If the other files in this Dataverse are downloaded to the working directory, then this Rcode will be able to replicate the original study without needing the user to update any file paths. Allard SBIR STTR WebScraper.py This is the code I deployed to multiple Amazon EC2 instances to scrape data o each individual award in my data set, including the contact info and DUNS data. Allard_Analysis_APPAM SBIR project Forthcoming Allard_Spatial Analysis Forthcoming Awards_SBIR_df.Rdata This unique data set consists of 89,330 observations spanning the years 1983 - 2018 and accounting for all eleven SBIR/STTR agencies. This data set consists of data collected from the Small Business Administration's Awards API and also unique data collected through web scraping by the author. Budget_SBIR_df.Rdata 246 observations for 20 agencies across 25 years of their budget-performance in the SBIR/STTR program. Data was collected from the Small Business Administration using the Annual Reports Dashboard, the Awards API, and an author-designed web crawler of the websites of awards. Solicit_SBIR-df.Rdata This data consists of observations of solicitations published by agencies for the SBIR program. This data was collected from the SBA Solicitations API. Primary Sources Small Business Administration. “Annual Reports Dashboard,” 2018. https://www.sbir.gov/awards/annual-reports. Small Business Administration. “SBIR Awards Data,” 2018. https://www.sbir.gov/api. Small Business Administration. “SBIR Solicit Data,” 2018. https://www.sbir.gov/api.
TM-Link
researchdata.edu.au
data.gov.au
Updated Apr 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IP Australia (2019). TM-Link [Dataset]. https://researchdata.edu.au/tm-link/2980846
Explore at:
Dataset updated
Apr 18, 2019
Dataset provided by
Data.govhttps://data.gov/
Authors
IP Australia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Dictionary available here\r \r 2020 August UPDATE\r \r * Fixed some quality issues with dates in the application table.\r * Fixed some quality issues with names in the owner table.\r \r 2020 March UPDATE\r \r * The TM-Link dataset has been updated to include more recent data and also has expanded to include more information. As such, the structure of the dataset has also changed.\r \r ---\r TM-Link is an international dataset IP Australia and Swinburne University have developed in collaboration. The dataset provides information from various jurisdictions, modelled under a common schema for greater accessibility to researchers and analysts. TM-Link also links together similar trade marks from different countries based on common information, such as similar trade mark phrases and applicant names. These links identify families of international trade marks, which provide a new and unique insight into international branding trends and export behaviours.\r IP Australia and Swinburne University are looking to continually develop TM-Link to become a core part of the global IP data landscape. If you have any suggestions or requests to model any additional data points, or improve the current accuracy of the data please let us know via email to ipdataplatform@ipaustralia.gov.au.\r \r For more information on the linking algorithm, please see:\r \r Petrie S, Kollmann T, Codoreanu A, Thomson R & Webster E (2019); International Trademarking and Regional Export Performance. Available at SSRN: https://ssrn.com/abstract=3445244\r \r For more information on TM-Link data collection and descriptive analyses, please see:\r \r Petrie S, Adams M, Mitra‐Kahn B, Johnson M, Thomson R, Jensen PH, Palangkaraya A, & Webster EM (2019); TM-Link: An Internationally Linked Trade Mark Database. Australian Economic Review, Forthcoming. Available at SSRN: https://ssrn.com/abstract=3511526
GAL Assessment Units 1000m 20160522 v01
researchdata.edu.au
Updated Dec 7, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2018). GAL Assessment Units 1000m 20160522 v01 [Dataset]. https://researchdata.edu.au/gal-assessment-units-20160522-v01/2989375
Explore at:
Dataset updated
Dec 7, 2018
Dataset provided by
Data.govhttps://data.gov/
Authors
Bioregional Assessment Program
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Description
Abstract \r

\r The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.\r \r \r \r To ensure efficiency for processing speed and rendering this is a clip of the Vector Reference grid for the GAL region.\r \r It was created with a 50km buffer of the extent of the Hunter PAE and then selecting all grid cells that intersect with the extent.\r \r The unique ID field for each grid cell is AUID and starts from 1 in the reference grid. The grid also has a column id and row for easy reference\r \r The grid is in Australia Albers (GDA94) (EPSG 3577)\r \r

Purpose \r

\r This is an attempt to standardise (where possible) outputs of models from BA assessments and is the template to be used for GAL (clipped from whole of BA reference Grid) for the groundwater and potentially surface water model outputs.\r \r

Dataset History \r

\r The minimum bounding geometry tool in ArcGIS 10.1 was used to return the extent of the Bioregion boundary. This was then buffered with a 50km radius.\r \r The select location tool in ArcGIS 10.1 was then used to select all gridcells within the buffered extent.\r \r An export of the grid cells was then created to produce a rectangle reference grid of the GAL region.\r \r The file contains 2 shape files \r \r 1) The grid cells clipped to the boundary\r \r 2) The boundary extents as a reference of the Region\r \r

Dataset Citation \r

\r Bioregional Assessment Programme (XXXX) GAL Assessment Units 1000m 20160522 v01. Bioregional Assessment Derived Dataset. Viewed 12 December 2018, http://data.bioregionalassessments.gov.au/dataset/96dffeea-5208-4cfc-8c5d-408af9ac508e.\r \r

Dataset Ancestors \r

\r * Derived From BA ALL Assessment Units 1000m Reference 20160516_v01\r \r * Derived From BA ALL Assessment Units 1000m 'super set' 20160516_v01\r \r
d
HUN AWRA-R simulation nodes v01
data.gov.au
researchdata.edu.au
+1more
zip
Updated Apr 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2022). HUN AWRA-R simulation nodes v01 [Dataset]. https://data.gov.au/data/dataset/fda20928-d486-49d2-b362-e860c1918b06
Explore at:
zip(9874)Available download formats
Dataset updated
Apr 13, 2022
Dataset authored and provided by
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The dataset was derived by the Bioregional Assessment Programme from multiple datasets. The source dataset is identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

The dataset consists of an excel spreadsheet and shapefile representing the locations of simulation nodes used in the AWRA-R model. Some of the nodes correspond to gauging station locations or dam locations whereas other locations represent river confluences or catchment outlets which have no gauging. These are marked as "Dummy".

Purpose

Locations are used as pour points in oder to define reach areas for river system modelling.

Dataset History

Subset of data for the Hunter that was extracted from the Bureau of Meteorology's hydstra system and includes all gauges where data has been received from the lead water agency of each jurisdiction. Simulation nodes were added in locations in which the model will provide simulated streamflow.

There are 3 files that have been extracted from the Hydstra database to aid in identifying sites in each bioregion and the type of data collected from each on. These data were used to determine the simulation node locations where model outputs were generated.

The 3 files contained within the source dataset used for this determination are:

Site - lists all sites available in Hydstra from data providers. The data provider is listed in the #Station as _xxx. For example, sites in NSW are _77, QLD are _66.

Some sites do not have locational information and will not be able to be plotted.

Period - the period table lists all the variables that are recorded at each site and the period of record.

Variable - the variable table shows variable codes and names which can be linked to the period table.

Relevant location information and other data were extracted to construct the spreadsheet and shapefile within this dataset.

Dataset Citation

Bioregional Assessment Programme (XXXX) HUN AWRA-R simulation nodes v01. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/fda20928-d486-49d2-b362-e860c1918b06.

Dataset Ancestors

Derived From National Surface Water sites Hydstra
f
Variable Selection with Multiply-Imputed Datasets: Choosing Between Stacked...
tandf.figshare.com
pdf
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiacong Du; Jonathan Boss; Peisong Han; Lauren J. Beesley; Michael Kleinsasser; Stephen A. Goutman; Stuart Batterman; Eva L. Feldman; Bhramar Mukherjee (2023). Variable Selection with Multiply-Imputed Datasets: Choosing Between Stacked and Grouped Methods [Dataset]. http://doi.org/10.6084/m9.figshare.19111441.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19111441.v2
Dataset updated
Jun 3, 2023
Dataset provided by
Taylor & Francis
Authors
Jiacong Du; Jonathan Boss; Peisong Han; Lauren J. Beesley; Michael Kleinsasser; Stephen A. Goutman; Stuart Batterman; Eva L. Feldman; Bhramar Mukherjee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Penalized regression methods are used in many biomedical applications for variable selection and simultaneous coefficient estimation. However, missing data complicates the implementation of these methods, particularly when missingness is handled using multiple imputation. Applying a variable selection algorithm on each imputed dataset will likely lead to different sets of selected predictors. This article considers a general class of penalized objective functions which, by construction, force selection of the same variables across imputed datasets. By pooling objective functions across imputations, optimization is then performed jointly over all imputed datasets rather than separately for each dataset. We consider two objective function formulations that exist in the literature, which we will refer to as “stacked” and “grouped” objective functions. Building on existing work, we (i) derive and implement efficient cyclic coordinate descent and majorization-minimization optimization algorithms for continuous and binary outcome data, (ii) incorporate adaptive shrinkage penalties, (iii) compare these methods through simulation, and (iv) develop an R package miselect. Simulations demonstrate that the “stacked” approaches are more computationally efficient and have better estimation and selection properties. We apply these methods to data from the University of Michigan ALS Patients Biorepository aiming to identify the association between environmental pollutants and ALS risk. Supplementary materials for this article are available online.
GAL Surface Water Reaches for Risk and Impact Analysis 20180803
researchdata.edu.au
data.gov.au
Updated Dec 7, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2018). GAL Surface Water Reaches for Risk and Impact Analysis 20180803 [Dataset]. https://researchdata.edu.au/gal-surface-water-analysis-20180803/2989417
Explore at:
Dataset updated
Dec 7, 2018
Dataset provided by
Data.govhttps://data.gov/
Authors
Bioregional Assessment Program
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Description
Abstract \r

\r The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement.\r \r The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.\r \r

Dataset History \r

\r Stream network constructed and defined using datasets shown in the Lineage.\r \r The stream network constructed using surface water nodes to define reaches and the the classification was assigned by using the data from the stream network from the lineage and then assigned the following classfication: \r \r 1.\tsurface water change due to hydrology\r \r 2.\tno change modelled at link node within PAE\r \r 3.\tmodelled no change at link node\r \r 4.\tmodelled change at link node\r \r 5. assumed change due to proximity to mine pit\r \r 6. assumed change due to hydrology\r \r Further tie-breaks were decide based on stream order or stream segment length.\r \r

Dataset Citation \r

\r Bioregional Assessment Programme (2017) GAL Surface Water Reaches for Risk and Impact Analysis 20180803. Bioregional Assessment Derived Dataset. Viewed 12 December 2018, http://data.bioregionalassessments.gov.au/dataset/64c4d16f-bdfa-4fd6-bd72-c459503003bd.\r \r

Dataset Ancestors \r

\r * Derived From Onsite and offsite mine infrastructure for the Carmichael Coal Mine and Rail Project, Adani Mining Pty Ltd 2012\r \r * Derived From Alpha Coal Project Environmental Impact Statement\r \r * Derived From Geofabric Surface Cartography - V2.1\r \r * Derived From QLD Exploration and Production Tenements (20140728)\r \r * Derived From China Stone Coal Project initial advice statement\r \r * Derived From Kevin's Corner Project Environmental Impact Statement\r \r * Derived From Galilee surface water modelling nodes\r \r * Derived From Geoscience Australia GEODATA TOPO series - 1:1 Million to 1:10 Million scale\r \r * Derived From China First Galilee Coal Project Environmental Impact Assessment\r \r * Derived From GEODATA TOPO 250K Series 3\r \r * Derived From Seven coal mines included in Galilee surface water modelling\r \r

Facebook

Twitter

Click to copy link

Link copied

Cite

Patrick VINCOURT (2025). Merger of BNV-D data (2008 to 2019) and enrichment [Dataset]. https://data.europa.eu/data/datasets/5f1c3eca9d149439e50c740f

Merger of BNV-D data (2008 to 2019) and enrichment

Explore at:

zip(18530465)Available download formats

Dataset updated

Jan 16, 2025

Dataset authored and provided by

Patrick VINCOURT

Description

Merging (in Table R) data published on https://www.data.gouv.fr/fr/datasets/ventes-de-pesticides-par-departement/, and joining two other sources of information associated with MAs: — uses: https://www.data.gouv.fr/fr/datasets/usages-des-produits-phytosanitaires/ — information on the “Biocontrol” status of the product, from document DGAL/SDQSPV/2020-784 published on 18/12/2020 at https://agriculture.gouv.fr/quest-ce-que-le-biocontrole

All the initial files (.csv transformed into.txt), the R code used to merge data and different output files are collected in a zip. enter image description here NB: 1) “YASCUB” for {year,AMM,Substance_active,Classification,Usage,Statut_“BioConttrol”}, substances not on the DGAL/SDQSPV list being coded NA. 2) The file of biocontrol products shall be cleaned from the duplicates generated by the marketing authorisations leading to several trade names.
3) The BNVD_BioC_DY3 table and the output file BNVD_BioC_DY3.txt contain the fields {Code_Region,Region,Dept,Code_Dept,Anne,Usage,Classification,Type_BioC,Quantite_substance)}

Clear search

Close search

Google apps

Main menu

Merger of BNV-D data (2008 to 2019) and enrichment

Replication Data for: \"A Topic-based Segmentation Model for Identifying...

Harmonized global datasets of soil carbon and heterotrophic respiration from...

Data from: A Machine Learning Model to Estimate Toxicokinetic Half-Lives of...

Data from: HOW TO PERFORM A META-ANALYSIS: A PRACTICAL STEP-BY-STEP GUIDE...

Datasets and R script

Datasets and R script to replicate the theoretical modeling from the article...

DIAMAS survey on Institutional Publishing - aggregated data

The data

Basic aggregate tabular data

Files included in this dataset

Data from: Data release for Linking land and sea through an...

Data and R code for: Linking climate variability to demography in...

ðŸ“ Thorley_2025_EcologicalMonographs.zip

Insider Threat Test Dataset

On-street Parking Bays

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Benchmark Datasets for Entity Linking from Tabular Data

Data from: SBIR - STTR Data and Code for Collecting Wrangling and Using It

TM-Link

GAL Assessment Units 1000m 20160522 v01

Abstract \r

Purpose \r

Dataset History \r

Dataset Citation \r

Dataset Ancestors \r

HUN AWRA-R simulation nodes v01

Abstract

Purpose

Dataset History

Dataset Citation

Dataset Ancestors

Variable Selection with Multiply-Imputed Datasets: Choosing Between Stacked...

GAL Surface Water Reaches for Risk and Impact Analysis 20180803

Abstract \r

Dataset History \r

Dataset Citation \r

Dataset Ancestors \r

Merger of BNV-D data (2008 to 2019) and enrichment