100+ datasets found

Text Analyses of Survey Data on "Mapping Research Output to the Sustainable...
zenodo.org
data.niaid.nih.gov
zip
Updated May 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maurice Vanderfeesten; Maurice Vanderfeesten; Linda Hasse; Linda Hasse (2020). Text Analyses of Survey Data on "Mapping Research Output to the Sustainable Development Goals (SDGs)" [Dataset]. http://doi.org/10.5281/zenodo.3832090
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3832090
Dataset updated
May 20, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maurice Vanderfeesten; Maurice Vanderfeesten; Linda Hasse; Linda Hasse
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This package contains data on five text analysis types (term extraction, contract analysis, topic modeling, network mapping), based on the survey data where researchers selected research output that are related to the 17 Sustainable Development Goals (SDGs). This is used as input to improve the current SDG classification model v4.0 to v5.0

Sustainable Development Goals are the 17 global challenges set by the United Nations. Within each of the goals specific targets and indicators are mentioned to monitor the progress of reaching those goals by 2030. In an effort to capture how research is contributing to move the needle on those challenges, we earlier have made an initial classification model than enables to quickly identify what research output is related to what SDG. (This Aurora SDG dashboard is the initial outcome as proof of practice.)

The initiative started from the Aurora Universities Network in 2017, in the working group "Societal Impact and Relevance of Research", to investigate and to make visible 1. what research is done that are relevant to topics or challenges that live in society (for the proof of practice this has been scoped down to the SDGs), and 2. what the effect or impact is of implementing those research outcomes to those societal challenges (this also have been scoped down to research output being cited in policy documents from national and local governments an NGO's).

Context of this dataset | classification model improvement workflow

The classification model we have used are 17 different search queries on the Scopus database.

SDG search queries version 4.0 (SQv4) have been created, Published here:

Search Queries for "Mapping Research Output to the Sustainable Development Goals (SDGs)" v4.0 by Aurora Universities Network (AUR) doi:10.5281/zenodo.3817443

A survey has been distributed to senior researchers to test the robustness of SQv4. Published here:

Survey data of "Mapping Research output to the Sustainable Development Goals SDGs" by Aurora Universities Network (AUR) doi:10.5281/zenodo.3798385

This text analysis has been made as one of the inputs to improve the classification model. Published here:

Text Analyses of Survey Data on "Mapping Research Output to the Sustainable Development Goals SDGs" by Aurora Universities Network (AUR) doi:10.5281/zenodo.3832090

Improved SDG search queries version 5.0 (SQv5) have been created, Published here:

Search Queries for "Mapping Research Output to the Sustainable Development Goals (SDGs)" v5.0 by Aurora Universities Network (AUR) doi:10.5281/zenodo.3817445

Methods used to do the text analysis

Term Extraction: after text normalisation (stemming, etc) we extracted 2 terms in bigrams and trigrams that co-occurred the most per document, in the title, abstract and keyword

Contrast analysis: the co-occurring terms in publications (title, abstract, keywords), of the papers that respondents have indicated relate to this SDG (y-axis: True), and that have been rejected (x-axis: False). In the top left you'll see term co-occurrences that a clearly relate to this SDG. The bottom-right are terms that are appear in papers that have been rejected for this SDG. The top-right terms appear frequently in both and cannot be used to discriminate between the two groups.

Network map: This diagram shows the cluster-network of terms co-occurring in the publications related to this SDG, selected by the respondents (accepted publications only).

Topic model: This diagram shows the topics, and the related terms that make up that topic. The number of topics is related to the number of of targets of this SDG.

Contingency matrix: This diagram shows the top 10 of co-occurring terms that correlate the most.

Software used to do the text analyses

CorTexT: The CorTexT Platform is the digital platform of LISIS Unit and a project launched and sustained by IFRIS and INRAE. This platform aims at empowering open research and studies in humanities about the dynamic of science, technology, innovation and knowledge production.

Resource with interactive visualisations

Based on the text analysis data we have created a website that puts all the SDG interactive diagrams together. For you to scrall through. https://sites.google.com/vu.nl/sdg-survey-analysis-results/

Data set content

In the dataset root you'll find the following folders and files:

/sdg01-17/

This contains the text analysis for all the individual SDG surveys.

/methods/

This contains the step-by-step explanations of the text analysis methods using Cortext.

/images/

images of the results used in this README.md.

LICENSE.md

terms and conditions for reusing this data.

README.md

description of the dataset; each subfolders contains a README.md file to futher describe the content of each sub-folder.

Inside an /sdg01-17/-folder you'll find the following:

This contains the step-by-step explanations of the text analysis methods using Cortext.

/sdg01-17/sdg04-sdg-survey-selected-publications-combined.db

his contains the title, abstract, keywords, fo the publications in the survey, including the and accept or rejection status and the number of respondents

/sdg01-17/sdg04-sdg-survey-selected-publications-combined-accepted-accepted-custom-filtered.db

same as above, but only the accepted papers

/sdg01-17/extracted-terms-list-top1000.csv

the aggregated list of co-occuring terms (bigrams and trigrams) extracted per paper.

/sdg01-17/contrast-analysis/

This contains the data and visualisation of the terms appearing in papers that have been accepted (true) and rejected (false) to be relating to this SDG.

/sdg01-17/topic-modelling/

This contains the data and visualisation of the terms clustered in the same number of topics as there are 'targets' within that SDG.

/sdg01-17/network-mapping/

This contains the data and visualisation of the terms clustered in co-occuring proximation of appearance in papers

/sdg01-17/contingency-matrix/

This contains the data and visualisation of the top 10 terms co-occuring

note: the .csv files are actually tab-separated.

Contribute and improve the SDG Search Queries

We welcome you to join the Github community and to fork, branch, improve and make a pull request to add your improvements to the new version of the SDG queries. https://github.com/Aurora-Network-Global/sdg-queries
Data from: SHREC'09 track: querying with partial models
catalog.data.gov
data.nist.gov
+1more
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). SHREC'09 track: querying with partial models [Dataset]. https://catalog.data.gov/dataset/shrec09-track-querying-with-partial-models-0b8d2
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
There are two objectives of this shape retrieval contest: a) To evaluate partial similarity between query and target objects and retrieve complete 3D models that are relevant to a partial query object. b) To retrieve 3D models those are relevant to a query depth map. This task corresponds to a real life scenario where the query is a 3D range scan of an object acquired from an arbitrary view direction. The algorithm should retrieve the relevant 3D objects from a database. Task description In response to a given set of queries, the task is to evaluate similarity scores with the target models and return an ordered ranked list along with the similarity scores for each query. The set of queries either consists of partial 3D models or of range images. The participants may present ranked lists for either of the query sets or both. There is no obligation to submit rank lists for both of the query sets. Dataset The first query set consists of 20 3D partial models which are obtained by cutting parts from complete models. The objective is to retrieve the models which the query part may belong to. The file format to represent the partial query models is the ASCII Object File Format (.off). This second query set is composed of 20 range images, which are acquired by capturing range data of 20 models from arbitrary view directions. The range images are captured using a desktop 3D scanner. The file format is in the ASCII Object File Format (.off) representing the scan in a triangular mesh. The target database is the same for both of the query sets and it contains 720 complete 3D models, which are categorized into 40 classes. In each class there are 18 models. The file format to represent the 3D models is the ASCII Object File Format (*.off).D Models, Classification files, Evaluation software, Images Paper: Dutagaci, Helin, Godil, Afzal, et al. "SHREC'09 track: querying with partial models." Proceedings of the 2nd Eurographics conference on 3D Object Retrieval. Eurographics Association, 2009. https://doi.org/10.5555/2381128.2381144
COKI Language Dataset
zenodo.org
application/gzip, csv
Updated Jun 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon (2022). COKI Language Dataset [Dataset]. http://doi.org/10.5281/zenodo.6636625
Explore at:
application/gzip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6636625
Dataset updated
Jun 16, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The COKI Language Dataset contains predictions for 122 million academic publications. The dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

Methodology
A subset of the COKI Academic Observatory Dataset, which is produced by the Academic Observatory Workflows codebase [1], was extracted and converted to CSV with Bigquery and downloaded to a virtual machine. The subset consists of all publications with DOIs in our dataset, including each publication’s title and abstract from both Crossref Metadata and Microsoft Academic Graph. The CSV files were then processed with a Python script. The titles and abstracts for each record were pre-processed, concatenated together and analysed with fastText. The titles and abstracts from Crossref Metadata were used first, with the MAG titles and abstracts serving as a fallback when the Crossref Metadata information was empty. Language was predicted for each publication using the fastText lid.176.bin language identification model [2]. fastText was chosen because of its high accuracy and fast runtime speed [3]. The final output dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

Query or Download
The data is publicly accessible in BigQuery in the following two tables:

coki-data-share.language.doi_language

coki-data-share.language.iso_language

When you make queries on these tables, make sure that you are in your own Google Cloud project, otherwise the queries will fail.

See the COKI Language Detection README for instructions on how to download the data from Zenodo and load it into BigQuery.

Code
The code that generated this dataset, the BigQuery schemas and instructions for loading the data into BigQuery can be found here: https://github.com/The-Academic-Observatory/coki-language

License
COKI Language Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

Attributions
This work contains information from:

Microsoft Academic Graph which is made available under the ODC Attribution Licence.

Crossref Metadata via the Metadata Plus program. Bibliographic metadata is made available without copyright restriction and Crossref generated data under a CC0 licence. See metadata licence information for more details.

References
[1] https://doi.org/10.5281/zenodo.6366695
[2] https://fasttext.cc/docs/en/language-identification.html
[3] https://modelpredict.com/language-identification-survey
e
MAST STScI CAOM and ObsCore TAP service - Dataset - B2FIND
b2find.eudat.eu
Updated Dec 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). MAST STScI CAOM and ObsCore TAP service - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/428b439b-4eff-589c-b4e2-930c806134b6
Explore at:
Dataset updated
Dec 20, 2022
Description
The MAST Archive at STScI TAP end point for observational data, saved in the Common Archive Data Model format and made available through the ObsCore limited view. The Table Access Protocol (TAP) lets you execute queries against our database tables, and inspect various metadata. Upload is not currently supported. Missions and projects with data available through the CAOMTAP service include: BEFS, EUVE, FUSE, GALEX, HLA, HST, HUT, IUE, JWST, K2, KEPLER, PS1 (PanSTARRS 1) Data Release 2, SPITZER_SHA, SWIFT, TESS, TUES, WUPPE.
Supplementary data: "Open modeling of electricity and heat demand curves for...
zenodo.org
bin, zip
Updated Sep 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clara Büttner; Jonathan Amme; Jonathan Amme; Julian Endres; Julian Endres; Aadit Malla; Aadit Malla; Birgit Schachler; Birgit Schachler; Ilka Cußmann; Clara Büttner; Ilka Cußmann (2022). Supplementary data: "Open modeling of electricity and heat demand curves for all residential buildings in Germany" [Dataset]. http://doi.org/10.5281/zenodo.6771218
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6771218
Dataset updated
Sep 14, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Clara Büttner; Jonathan Amme; Jonathan Amme; Julian Endres; Julian Endres; Aadit Malla; Aadit Malla; Birgit Schachler; Birgit Schachler; Ilka Cußmann; Clara Büttner; Ilka Cußmann
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered
Germany
Description
THIS VERSION IS OUTDATED, PLEASE CHECK OUT THE LAST VERSION HERE: https://zenodo.org/record/6771217

_

This repository contains result data for the paper "Open modeling of electricity and heat demand curves for all residential buildings in Germany".

The published data includes residential electricity and heat demand profiles for every building in Germany. It was created with the open source tool eGon-data within the research project eGon. All input data sets as well as the code are available under open source licenses.

Files

The profile data is stored as PostgreSQL database in attached backup file. The data can be restored by using e.g. pgAdmin or via PostgreSQL's pg_restore command. See section Database structure below for details.

Unpack the zip files.

The directory scripts/ contains example scripts to obtain electricity and heat profiles.

The directory additional_data/ contains TRY climate zones and weather data which can be used to extract heat profiles.

See section Example scripts below for details.

Database structure

After restoring the backup file, the data is stored in different schemas: society, openstreetmap and demand. Different tables have to be combined to create the final demand time series for heat and electricity. In the following, the tables and the matching methods are described.

The schema society includes data from Census 2011 on population in 100m x 100m cells ('Census cells'). The cells are georeferenced and have a unique id.

Schema: society

destatis_zensus_population_per_ha_inside_germany
National census in Germany in 2011 with the bounds on Germanys borders.

id: Unique identifier

grid_id: Grid number of Census

population: Number of registred residents

geom_point: Geometry centroid (CRS: ERTS89)

geom: Geometry (CRS: ERTS89)

Schema: openstreetmap

The schema openstreetmap includes data on residential buildings. All buildings hold an internal building_id. All residential buildings extracted from openstreetmap are stored in openstreetmap.osm_buildings_residential including osm_id and internal building_id. Additional, synthetic buildings are stored in openstreetmap.osm_buildings_synthetic.

osm_buildings_residential: Filtered list of residential buildings from OpenStreetMap - (c) OpenStreetMap contributors

id: Building id

osm_id: Openstreetmap id

amenity: Amenity in building

building: Type of building

name: Name of the building

geom: Polygon of building (CRS: ERTS89)

area: Surface area of building

geom_point: Centroid of building (CRS: ERTS89)

tags: Opensteetmap tags

osm_buildings_synthetic: List of generated synthetic buildings

id: Building id

geom: Polygon of building (CRS: ERTS89)

geom_point: Centroid of building (CRS: ERTS89)

grid_id: Census grid id (reference to: society.destatis_zensus_population_per_ha_inside_germany.grid_id)

cell_id: Census cell id (reference to: society.destatis_zensus_population_per_ha_inside_germany.id)

building: Building type (residential)

area: Surface area

Schema: demand

With the profile_ids in egon_household_electricity_profile_of_buildings, specific profiles from iee_household_load_profiles are mapped to all residential buildings. The profiles need to be scaled therafter by their annual sum and the corresponding scaling factors, which can be found in egon_household_electricity_profile_in_census_cell and matched per census cell id.

egon_household_electricity_profile_in_census_cell: Mapping table for household electricity profiles to census cell including scaling factors for two scenarios (eGon2035, eGon100RE) .

cell_id: Census cell id (reference to: society.destatis_zensus_population_per_ha_inside_germany.id)

grid_id: Census grid id

cell_profile_ids: Household profile ids

nuts3: NUTS 3 code

nuts1: NUTS 1 code

factor_2035: Scaling factor for scenario eGon2035

factor_2050: Scaling factor for scenario eGon100RE

iee_household_load_profiles: 100.000 annual profiles in hourly resolution of electricity demand of private households for different household types (singles, couples, other) with varying number of elderly and children. The profiles were created using a bottom-up load profile generator by Fraunhofer IEE developed in the Bachelor's thesis "Auswirkungen verschiedener Haushaltslastprofile auf PV-Batterie-Systeme" by Jonas Haack, Fachhochschule Flensburg, December 2012. The columns are named as follows: "

egon_household_electricity_profile_of_building: Mapping table for household electricity profiles to buildings via internal building_id and corresponding census cell_id.

id: Unique identifier

building_id: Building id (reference to: osm_buildings_residential.id, osm_buildings_synthetic.id)

cell_id: Census cell id (reference to: society.destatis_zensus_population_per_ha_inside_germany.id)

profile_id: Household profile id (reference to: iee_household_load_profiles.type)

Heat demand profiles per building can be created by combining the tables egon_peta_heat, heat_idp_pool and heat_timeseries_selected_profiles. In addition, weather data (e.g. from ERA5, located in additional_data/) is needed to distribute the annual heat demands to single days. This is included in the example script, the usage is described below.

egon_peta_heat: Table for annual heat demands of residential and service sector per Census cell

demand: Annual heat demand in MWh

id: Unique identifier

scenario: Scenario name (either eGon2035 or eGon100RE)

sector: Demand sector (either 'residential' or 'service')

zensus_population_id: id of the Census cell (reference to: society.destatis_zensus_population_per_ha_inside_germany.id)

heat_idp_pool: About 460,000 inidvidual daily heat demand profiles per building including the temeprature class and building type.

house: Single- or multi-family house

idp: Normalized demand timeseries for one day (24 hours)

index: Unique identifier

temperature_class: Number of corresponding temperature class

heat_timeseries_selected_profiles: Mapping table for household heat profiles to buildings per day via internal building_id and corresponding census cell_id.

ID: Unique identifier

building_id: Id of the corresponding building (reference to: osm_buildings_residential.id, osm_buildings_synthetic.id)

selected_ipd_profiles: Array of selected profiles per day (values in array refer to: heat_idp_pool.index)

zensus_population_id: id of corresponding Census cell (reference to: society.destatis_zensus_population_per_ha_inside_germany.id)

Weather data and the used climate zones are not included in the database. They are stored in files which are part of the additional_data/ folder. In this folder, you find the following data sets:

TRY_Climate_Zone: Climate zones in Germany

source: Own representation based on DWD TRY climate zones

License: Attribution 4.0 International (CC BY 4.0)

germany-2011-era5.nc: ERA 5 Weather data for the year 2011

Source: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview

Terms of Use: https://cds.climate.copernicus.eu/api/v2/terms/static/20180314_Copernicus_License_V1.1.pdf

Example queries

Electricity profiles: The demand profiles for residential buildings can be obtained using the tables stored in the demand schema. To extract electricity demand profiles, the following tables have to be combined:

egon_household_electricity_profile_in_census_cell

iee_household_load_profiles

egon_household_electricity_profile_of_building

Example script to obtain the electrical demand timeseries for 1 specific building for the eGon2035 scenario:
Database Development and Management Tools Software Market Research Report...
growthmarketreports.com
csv, pdf, pptx
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Database Development and Management Tools Software Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/database-development-and-management-tools-software-market-global-industry-analysis
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Database Development and Management Tools Software Market Outlook

According to our latest research, the global database development and management tools software market size reached USD 15.8 billion in 2024, reflecting robust demand across diverse sectors. The market is anticipated to expand at a CAGR of 13.2% during the forecast period, propelling the market to an estimated USD 44.2 billion by 2033. This impressive growth is driven by the escalating need for efficient data management, the proliferation of cloud-based solutions, and the increasing complexity of enterprise data environments. As organizations worldwide continue to digitize their operations and harness big data analytics, the demand for advanced database development and management tools software is set to surge.

One of the primary growth factors for the database development and management tools software market is the exponential increase in data volumes generated by businesses, governments, and individuals alike. The digital transformation wave sweeping across industries necessitates robust solutions for storing, organizing, and retrieving vast datasets with high reliability and speed. Organizations are increasingly leveraging data-driven insights to enhance decision-making, optimize operations, and personalize customer experiences. This reliance on data has compelled enterprises to invest in sophisticated database development and management tools that can handle complex queries, streamline data modeling, and ensure data integrity. As a result, both established enterprises and emerging startups are prioritizing investments in this market, further fueling its expansion.

Another significant driver of market growth is the rapid adoption of cloud computing technologies. Cloud-based database management solutions offer unparalleled scalability, flexibility, and cost-effectiveness compared to traditional on-premises systems. With organizations seeking to minimize IT infrastructure costs and improve accessibility, cloud deployment models are gaining substantial traction. This shift is particularly pronounced among small and medium enterprises (SMEs), which benefit from the reduced upfront investment and operational agility provided by cloud solutions. Additionally, the integration of artificial intelligence and machine learning capabilities into database tools is enabling automated performance monitoring, predictive maintenance, and advanced security management, further enhancing the value proposition of these solutions.

The growing emphasis on data security and regulatory compliance is also shaping the trajectory of the database development and management tools software market. With the rising incidence of cyberattacks and stringent data protection regulations such as GDPR, HIPAA, and CCPA, organizations are under pressure to safeguard sensitive information and ensure compliance. Advanced database management tools now incorporate robust security features, including encryption, access controls, and real-time threat detection, to address these concerns. Vendors are continuously innovating to provide end-to-end security management and automated compliance reporting, making their solutions indispensable for businesses operating in highly regulated industries such as BFSI, healthcare, and government.

Regionally, North America continues to dominate the market, accounting for the largest revenue share in 2024, followed closely by Europe and the Asia Pacific. The presence of leading technology providers, early adoption of digital technologies, and a strong focus on innovation contribute to North America's leadership. Meanwhile, the Asia Pacific region is experiencing the fastest growth, driven by rapid industrialization, increasing IT investments, and the proliferation of cloud-based services in emerging economies such as China and India. Europe maintains a steady growth trajectory, supported by stringent data protection regulations and a mature enterprise IT landscape. Latin America and the Middle East & Africa are also witnessing increased adoption, albeit at a slower pace, as organizations in these regions gradually embrace digital transformation.

"https://growthmarketreports.com/request-sample/6931">
a
2023 Irrigated Lands for the Mountain Home Plateau: Machine Learning...
data-idwr.hub.arcgis.com
gis-idaho.hub.arcgis.com
Updated May 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Idaho Department of Water Resources (2024). 2023 Irrigated Lands for the Mountain Home Plateau: Machine Learning Generated [Dataset]. https://data-idwr.hub.arcgis.com/documents/b5c6474cb4ae459480bb804127c4831e
Explore at:
Dataset updated
May 15, 2024
Dataset authored and provided by
Idaho Department of Water Resources
Description
This raster file represents land within the Mountain Home study boundary classified as either “irrigated” with a cell value of 1 or “non-irrigated” with a cell value of 0 at a 10-meter spatial resolution. These classifications were determined at the pixel level by use of Random Forest, a supervised machine learning algorithm. Classification models often employ Random Forest due to its accuracy and efficiency at labeling large spatial datasets. To build a Random Forest model and supervise the learning process, IDWR staff create pre-labeled data, or training points, which are used by the algorithm to construct decision trees that will be later used on unseen data. Model accuracy is determined using a subset of the training points, otherwise known as a validation dataset. Several satellite-based input datasets are made available to the Random Forest model, which aid in distinguishing characteristics of irrigated lands. These characteristics allow patterns to be established by the model, e.g., high NDVI during summer months for cultivated crops, or consistently low ET for dryland areas. Mountain Home Irrigated Lands 2023 employed the following input datasets: US Geological Survey (USGS) products, including Landsat 8/9 and 10-meter 3DEP DEM, and European Space Agency (ESA) Copernicus products, including Harmonized Sentinel-2 and Global 30m Height Above Nearest Drainage (HAND). For the creation of manually labeled training points, IDWR staff accessed the following datasets: NDVI derived from Landsat 8/9, Sentinel-2 CIR imagery, US Department of Agriculture National Agricultural Statistics Service (USDA NASS) Cropland Data Layer, Active Water Rights Place of Use data from IDWR, and USDA’s National Agriculture Imagery Program (NAIP) imagery. All datasets were available for the current year of interest (2023). The published Mountain Home Irrigated Lands 2023 land classification raster was generated after four model runs, where at each iteration, IDWR staff added or removed training points to help improve results. Early model runs showed poor results in riparian areas near the Snake River, concentrated animal feeding operations (CAFOs), and non-irrigated areas at higher elevations. These issues were resolved after several model runs in combination with post-processing masks. Masks used include Fish and Wildlife Service’s National Wetlands Inventory (FWS NWI) data. These data were amended to exclude polygons overlying irrigated areas, and to expand riparian area in specific locations. A manually created mask was primarily used to fill in areas around the Snake River that the model did not uniformly designate as irrigated. Ground-truthing and a thorough review of IDWR’s water rights database provided further insight for class assignments near the town of Mayfield. Lastly, the Majority Filter tool in ArcGIS was applied using a kernel of 8 nearest neighbors to smooth out “speckling” within irrigated fields. The masking datasets and the final iteration of training points are available on request. Information regarding Sentinel and Landsat imagery:All satellite data products used within the Random Forest model were accessed via the Google Earth Engine API. To find more information on Sentinel data used, query the Earth Engine Data Catalog https://developers.google.com/earth-engine/datasets) using “COPERNICUS/S2_SR_HARMONIZED.” Information on Landsat datasets used can be found by querying “LANDSAT/LC08/C02/T1_L2” (for Landsat 8) and “LANDSAT/LC09/C02/T1_L2” (for Landsat 9).Each satellite product has several bands of available data. For our purposes, shortwave infrared 2 (SWIR2), blue, Normalized Difference Vegetation Index (NDVI), and near infrared (NIR) were extracted from both Sentinel and Landsat images. These images were later interpolated to the following dates: 2023-04-15, 2023-05-15, 2023-06-14, 2023-07-14, 2023-08-13, 2023-09-12. Interpolated values were taken from up to 45 days before and after each interpolated date. April-June interpolated Landsat images, as well as the April interpolated Sentinel image, were not used in the model given the extent of cloud cover overlying irrigated area. For more information on the pre-processing of satellite data used in the Random Forest model, please reach out to IDWR at gisinfo@idwr.idaho.gov.
d
InQuartik-ASSIGNEE QUERY: Patent Owner Data Worldwide with 50+ Years of...
datarade.ai
.json, .csv, .xls
Updated Dec 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
InQuartik (2021). InQuartik-ASSIGNEE QUERY: Patent Owner Data Worldwide with 50+ Years of History [Dataset]. https://datarade.ai/data-products/inquartik-assignee-query-patent-owner-data-worldwide-with-inquartik
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Dec 1, 2021
Dataset authored and provided by
InQuartik
Area covered
Philippines, Hong Kong, Dominican Republic, Switzerland, Austria, Honduras, Bahamas, France, Puerto Rico, Italy
Description
Are you looking for data that tell if the companies or persons you look into own any patents? If they do, do you want to know how many patents they own?

The Assignee Query Data will provide you with a timely and comprehensive result of global patent ownership of the companies or individuals with the history of 50 years.

How do we do that?

We include decades’ worth of global full-text databases, such as the US, China, EM/EUIPO, Japan, Korea, WIPO and so on, and keep them updated on a timely basis—as frequently as every day or week, depending on the sources.

Furthermore, the data downloaded are cleansed to minimize data errors and thus search and analysis errors. For example, we standardize assignee names to enables individual patents to correspond to a single owner; logic-based corrections ensure that values are corrected based on rules.

In addition, we use advanced algorithms in analyzing, selecting, and presenting the most current and accurate information from multiple available data sources. For instance, a single patent’s legal status is triangulated across different patent data for accuracy. Moreover, proprietary Quality and Value rankings put patents in each key market under the equally evaluative process, offering subjective predictions for the patent's likelihood of validity and monetization.
SHREC'14 Track: Large Scale Comprehensive 3D Shape Retrieval
catalog.data.gov
data.nist.gov
+2more
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). SHREC'14 Track: Large Scale Comprehensive 3D Shape Retrieval [Dataset]. https://catalog.data.gov/dataset/shrec14-track-large-scale-comprehensive-3d-shape-retrieval-51330
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
Objective: The objective of this track is to evaluate the performance of 3D shape retrieval approaches on a large-sale comprehensive 3D shape database which contains different types of models, such as generic, articulated, CAD and architecture models. Introduction: With the increasing number of 3D models created every day and stored in databases, the development of effective and scalable 3D search algorithms has become an important research area. In this contest, the task will be retrieving 3D models similar to a complete 3D model query from a new integrated large-scale comprehensive 3D shape benchmark including various types of models. Owing to the integration of the most important existing benchmarks to date, the newly created benchmark is the most exhaustive to date in terms of the number of semantic query categories covered, as well as the variations of model types. The shape retrieval contest will allow researchers to evaluate results of different 3D shape retrieval approaches when applied on a large scale comprehensive 3D database. The benchmark is motivated by a latest large collection of human sketches built by Eitz et al. [1]. To explore how human draw sketches and human sketch recognition, they collected 20,000 human-drawn sketches, categorized into 250 classes, each with 80 sketches. This sketch dataset is exhaustive in terms of the number of object categories. Thus, we believe that a 3D model retrieval benchmark based on their object categorizations will be more comprehensive and appropriate than currently available 3D retrieval benchmarks to more objectively and accurately evaluate the real practical performance of a comprehensive 3D model retrieval algorithm if implemented and used in the real world. Considering this, we build a SHREC'14 Large Scale Comprehensive Track Benchmark (SHREC14LSGTB) by collecting relevant models in the major previously proposed 3D object retrieval benchmarks. Our target is to find models for as many as classes of the 250 classes and find as many as models for each class. These previous benchmarks have been compiled with different goals in mind and to date, not been considered in their sum. Our work is the first to integrate them to form a new, larger benchmark corpus for comprehensive 3D shape retrieval. Dataset: SHREC'14 Large Scale Comprehensive Retrieval Track Benchmark has 8,987 models, categorized into 171 classes. We adopt a voting scheme to classify models. For each classification, we have at least two votes. If these two votes agree each other, we confirm that the classification is correct, otherwise, we perform a third vote to finalize the classification. All the models are categorized according to the classifications in Eitz et al. [1], based on visual similarity. Evaluation Method: To have a comprehensive evaluation of the retrieval algorithm, we employ seven commonly adopted performance metrics in 3D model retrieval technique. Please cite the papers: [1] Bo Li, Yijuan Lu, Chunyuan Li, Afzal Godil, Tobias Schreck, Masaki Aono, Martin Burtscher, Qiang Chen, Nihad Karim Chowdhury, Bin Fang, Hongbo Fu, Takahiko Furuya, Haisheng Li, Jianzhuang Liu, Henry Johan, Ryuichi Kosaka, Hitoshi Koyanagi, Ryutarou Ohbuchi, Atsushi Tatsuma, Yajuan Wan, Chaoli Zhang, Changqing Zou. A Comparison of 3D Shape Retrieval Methods Based on a Large-scale Benchmark Supporting Multimodal Queries. Computer Vision and Image Understanding, November 4, 2014. [2] Bo Li, Yijuan Lu, Chunyuan Li, Afzal Godil, Tobias Schreck, Masaki Aono, Qiang Chen, Nihad Karim Chowdhury, Bin Fang, Takahiko Furuya, Henry Johan, Ryuichi Kosaka, Hitoshi Koyanagi, Ryutarou Ohbuchi, Atsushi Tatsuma. SHREC' 14 Track: Large Scale Comprehensive 3D Shape Retrieval. Eurographics Workshop on 3D Object Retrieval 2014 (3DOR 2014): 131-140, 2014.
Human Instructions - Multilingual (wikiHow)
kaggle.com
Updated Mar 17, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paolo Pareti (2017). Human Instructions - Multilingual (wikiHow) [Dataset]. https://www.kaggle.com/paolop/human-instructions-multilingual-wikihow/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 17, 2017
Dataset provided by
Kaggle
Authors
Paolo Pareti
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Human Instructions Dataset - Multilingual

Updated JSON files for English at this other Kaggle repository

Available in 16 Different Languages Extracted from wikiHow

Overview

Step-by-step instructions have been extracted from wikiHow in 16 different languages and decomposed into a formal graph representation like the one showed in the picture below. The source pages where the instructions have been extracted from have also been collected and they can be shared upon request.

Instructions are represented in RDF following the PROHOW vocabulary and data model. For example, the category, steps, requirements and methods of each set of instructions have been extracted.

This dataset has been produced as part of the The Web of Know-How project.

To cite this dataset use: Paula Chocron, Paolo Pareti. Vocabulary Alignment for Collaborative Agents: a Study with Real-World Multilingual How-to Instructions. (PDF) (bibtex)

Quick-Start: Instruction Extractor and Simplifier Script

The large amount of data can make it difficult to work with this dataset. This is why an instruction-extraction python script was developed. This script allows you to:

select only the subset of the dataset you are interested in. For example only instructions from specific wikiHow pages, or instructions that fall within specific categories, such as cooking recipes, or those that have at least 5 steps, etc. The file class_hierarchy.ttl attached to this dataset is used to determine whether a set of instructions falls under a certain category or not.

simplify the data model of the instructions. The current data model is rich of semantic relations. However, this richness might make it complex to use. This script allows you to simplify the data model to make it easier to work with the data. An example graphical representation of this model is available here.

The script is available on this GitHub repository.

The Available Languages

This page contains the link to the different language versions of the data.

A previous version of this type of data, although for English only, is also available on Kaggle:

Monolingual English: 200.000 No Multilingual Links, from wikiHow and Snapguide

For the multilingual dataset, this is the list of the available languages and number of articles in each:

English: 133.842

German: 57.533

Hindi: 6.519

Russian: 127.738

Korean: 7.606

Portuguese: 92.520

Italian: 79.656

French: 60.105

Spanish: 120.507

Chinese: 82.558

Czech: 10.619

Arabic: 15.589

Thai: 10.213

Vietnamese: 8.670

Indonesian: 39.246

Dutch: 19.318

Querying the Dataset

The dataset is in RDF and it can be queried in SPARQL. Sample SPARQL queries are available in this GitHub page.

For example, [this SPARQL query](http://dydra.com/paolo-pareti/wikihow_multilingual/query?query=PREFIX%20prohow%3A%20%3Chttp%3A%2F%2Fw3id.org%2Fprohow%23%3E%20%0APREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%20%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%20%0APREFIX%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%20%0APREFIX%20oa%3A%20%3Chttp%3A%2F%2Fw...
The files on your computer
kaggle.com
Updated Jan 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cogs (2017). The files on your computer [Dataset]. https://www.kaggle.com/cogitoe/crab/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 15, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
cogs
Description
Dataset: The files on your computer.

Crab is a command line tool for Mac and Windows that scans file data into a SQLite database, so you can run SQL queries over it.

e.g. (Win) C:> crab C:somepathMyProject or (Mac) $ crab /some/path/MyProject

You get a CRAB> prompt where you can enter SQL queries on the data, e.g. Count files by extension

SELECT extension, count(*) FROM files GROUP BY extension;

e.g. List the 5 biggest directories

SELECT parentpath, sum(bytes)/1e9 as GB FROM files GROUP BY parentpath ORDER BY sum(bytes) DESC LIMIT 5;

Crab provides a virtual table, fileslines, which exposes file contents to SQL

e.g. Count TODO and FIXME entries in any .c files, recursively

SELECT fullpath, count(*) FROM fileslines WHERE parentpath like '/Users/GN/HL3/%' and extension = '.c' and (data like '%TODO%' or data like '%FIXME%') GROUP BY fullpath;

As well there are functions to run programs or shell commands on any subset of files, or lines within files e.g. (Mac) unzip all the .zip files, recursively

SELECT exec('unzip', '-n', fullpath, '-d', '/Users/johnsmith/Target Dir/') FROM files WHERE parentpath like '/Users/johnsmith/Source Dir/%' and extension = '.zip';

(Here -n tells unzip not to overwrite anything, and -d specifies target directory)

There is also a function to write query output to file, e.g. (Win) Sort the lines of all the .txt files in a directory and write them to a new file

SELECT writeln('C:UsersSJohnsondictionary2.txt', data) FROM fileslines WHERE parentpath = 'C:UsersSJohnson' and extension = '.txt' ORDER BY data;

In place of the interactive prompt you can run queries in batch mode. E.g. Here is a one-liner that returns the full path all the files in the current directory

C:> crab -batch -maxdepth 1 . "SELECT fullpath FROM files"

Crab SQL can also be used in Windows batch files, or Bash scripts, e.g. for ETL processing.

Crab is free for personal use, $5/mo commercial

See more details here (mac): [http://etia.co.uk/][1] or here (win): [http://etia.co.uk/win/about/][2]

An example SQLite database (Mac data) has been uploaded for you to play with. It includes an example files table for the directory tree you get when downloading the Project Gutenberg corpus, which contains 95k directories and 123k files.

To scan your own files, and get access to the virtual tables and support functions you have to use the Crab SQLite shell, available for download from this page (Mac): [http://etia.co.uk/download/][3] or this page (Win): [http://etia.co.uk/win/download/][4]

Content

FILES TABLE

The FILES table contains details of every item scanned, file or directory. All columns are indexed except 'mode'

COLUMNS fileid (int) primary key -- files table row number, a unique id for each item name (text) -- item name e.g. 'Hei.ttf' bytes (int) -- item size in bytes e.g. 7502752 depth (int) -- how far scan recursed to find the item, starts at 0 accessed (text) -- datetime item was accessed modified (text) -- datetime item was modified basename (text) -- item name without path or extension, e.g. 'Hei' extension (text) -- item extension including the dot, e.g. '.ttf' type (text) -- item type, 'f' for file or 'd' for directory mode (text) -- further type info and permissions, e.g. 'drwxr-xr-x' parentpath (text) -- absolute path of directory containing the item, e.g. '/Library/Fonts/' fullpath (text) unique -- parentpath of the item concatenated with its name, e.g. '/Library/Fonts/Hei.ttf' PATHS 1) parentpath and fullpath don't support abbreviations such as ~ . or .. They're just strings. 2) Directory paths all have a '/' on the end.

FILESLINES TABLE

The FILESLINES table is for querying data content of files. It has line number and data columns, with one row for each line of data in each file scanned by Crab.

This table isn't available in the example dataset, because it's a virtual table and doesn't physically contain data.

COLUMNS linenumber (int) -- line number within file, restarts count from 1 at the first line of each file data (text) -- data content of the files, one entry for each line

FILESLINES also duplicates the columns of the FILES table: fileid, name, bytes, depth, accessed, modified, basename, extension, type, mode, parentpath, and fullpath. This way you can restrict which files are searched without having to join tables.

Example Gutenberg data

An example SQLite database (Mac data), database.sqlite, has been uploaded for you to play with. It includes an example files table...
Understanding the Influence of Parameter Value Uncertainty on Climate Model...
data.niaid.nih.gov
dataone.org
+1more
zip
Updated May 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Ingersoll; Heather Childers; Sujan Bhattarai (2024). Understanding the Influence of Parameter Value Uncertainty on Climate Model Output: Developing an Interactive Web Dashboard [Dataset]. http://doi.org/10.5061/dryad.vq83bk422
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.vq83bk422
Dataset updated
May 30, 2024
Dataset provided by
University of California, Santa Barbara
Authors
Sofia Ingersoll; Heather Childers; Sujan Bhattarai
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Scientists at the National Center for Atmospheric Research have recently carried out several experiments to better understand the uncertainties associated with future climate projections. In particular, the NCAR Climate and Global Dynamics Lab (CGDL) working group has completed a large Parameter Perturbation Experiment (PPE) utilizing the Community Land Model (CLM), testing the effects of 32 parameters over thousands of simulations over a range of 250 years. The CLM model experiment is focused on understanding uncertainty around biogeophysical parameters that influence the balance of chemical cycling and sequestration variables. The current website for displaying model results is not intuitive or informative to the broader scientific audience or the general public. The goal of this project is to develop an improved data visualization dashboard for communicating the results of the CLM PPE. The interactive dashboard would provide an interface where new or experienced users can query the experiment database to ask which environmental processes are affected by a given model parameter, or vice versa. Improving the accessibility of the data will allow professionals to use the most recent land parameter data when evaluating the impact of a policy or action on climate change. Methods Data Source:

University of California, Santa Barbara – Climate and Global Dynamics Lab, National Center for Atmospheric Research: Parameter Perturbation Experiment (CGD NCAR PPE-5). https://webext.cgd.ucar.edu/I2000/PPEn11_OAAT/ (Only public version of the data currently accessible. Data leveraged in this project is currently stored on the NCAR server and is not publicly available), https://www.cgd.ucar.edu/events/seminar/2023/katie-dagon-and-daniel-kennedy-132940 (Learn more about this complex data via this amazing presentation by Katie Dragon & Daniel Kennedy ^) The Parameter Perturbation Experiment data leveraged by our project was generated utilizing the Community Land Model v5 (CLM5) predictions. https://www.earthsystemgrid.org/dataset/ucar.cgd.ccsm4.CLM_LAND_ONLY.html

Data Processing: We were working inside of NCAR’s CASPER cluster HPC server, this enabled us direct access to the raw data files. We created a script to read in 500 LHC PPE simulations as a data set with inputs for a climate variable and time range. When reading in the cluster of simulations, there is a preprocess function that performs dimensional reduction to simplify the data set for wrangling later. Once the data sets of interest were loaded, they were then ready for some dimensional corrections – some quirks that come with using CESM data. Our friend’s at NCAR CGDL actually provided us with the correct time-paring bug. The other functions to weigh each grid cell by land area, properly weigh each month according to their contribution to the number of days in a year, and to calculate the global average of each simulation were generated by our team to wrangle the data so it is suitable for emulation. These files were saved so they could be leveraged later using a built-in if-else statement within the read_n_wrangle() function. The preprocessed data is then used in the GPR ML Emulator to make 100 predictions for a climate variable of interest and 32 individual parameters. To summarize briefly without getting too into the nitty gritty, our GPR emulator does 3 things:

Simplifies the LHC data so it can look at 1 parameter at a time and assess its relationship with a climate variable. Applies Fourier Amplitude Sensitivity Analysis to identify relationships between parameters and climate variables. It helps us see what the key influencers are. In the full chaotic LHC, it can assess the covariance of the parameter-parameter predictions simultaneously (this is the R^2 value you’ll see on your accuracy inset plot later)

Additionally, it ‘pickles’ and saves the predictions and trained gpr_model so they can be utilized for further analysis, exploration, and visualizations. Attributes and structures defined in this notebook outlines the workflow utilized to generate the data in this repo. It pulls functions from this utils.py to execute the desired commands. Below we will look at the utils.py functions that are not explicitly defined in the notebook. – General side note: if you decide to explore that Attributes and structures defined in this notebook explaining how the data was made, you’ll notice you’ll be transported to another repo in this Organization: GaiaFuture. That’s our prototype playground! It’s a little messy because that’s where we spent the second half of this project tinkering. The official repository is https://github.com/GaiaFuture/CLM5_PPE_Emulator.
MetaMath QA
kaggle.com
Updated Nov 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b/suggestions?status=pending
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
MetaMath QA

Mathematical Questions for Large Language Models

By Huggingface Hub [source]

About this dataset

This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Data Dictionary

The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

Preparing data for analysis

It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

##### Training Models using Mistral 7B

Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

##### Testing phosphors :

After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

Research Ideas

Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.

Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.

Optimizing search algorithms that surface relevant answer results based on types of queries

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
d
Data from: DEEPEN 3D PFA Favorability Models and 2D Favorability Maps at...
catalog.data.gov
data.openei.org
+1more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2025). DEEPEN 3D PFA Favorability Models and 2D Favorability Maps at Newberry Volcano [Dataset]. https://catalog.data.gov/dataset/deepen-3d-pfa-favorability-models-and-2d-favorability-maps-at-newberry-volcano-7185c
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
National Renewable Energy Laboratory
Area covered
Newberry Volcano
Description
DEEPEN stands for DE-risking Exploration of geothermal Plays in magmatic ENvironments. Part of the DEEPEN project involved developing and testing a methodology for a 3D play fairway analysis (PFA) for multiple play types (conventional hydrothermal, superhot EGS, and supercritical). This was tested using new and existing geoscientific exploration datasets at Newberry Volcano. This GDR submission includes images, data, and models related to the 3D favorability and uncertainty models and the 2D favorability and uncertainty maps. The DEEPEN PFA Methodology is based on the method proposed by Poux et al. (2020), which uses the Leapfrog Geothermal software with the Edge extension to conduct PFA in 3D. This method uses all available data to build a 3D geodata model which can be broken down into smaller blocks and analyzed with advanced geostatistical methods. Each data set is imported into a 3D model in Leapfrog and divided into smaller blocks. Conditional queries can then be used to assign each block an index value which conditionally ranks each block's favorability, from 0-5 with 5 being most favorable, for each model (e.g., lithologic, seismic, magnetic, structural). The values between 0-5 assigned to each block are referred to as index values. The final step of the process is to combine all the index models to create a favorability index. This involves multiplying each index model by a given weight and then summing the resulting values. The DEEPEN PFA Methodology follows this approach, but split up by the specific geologic components of each play type. These components are defined as follows for each magmatic play type: 1. Conventional hydrothermal plays in magmatic environments: Heat, fluid, and permeability 2. Superhot EGS plays: Heat, thermal insulation, and producibility (the ability to create and sustain fractures suitable for and EGS reservoir) 3. Supercritical plays: Heat, supercritical fluid, pressure seal, and producibility (the proper permeability and pressure conditions to allow production of supercritical fluid) More information on these components and their development can be found in Kolker et al., 2022. For the purposes of subsurface imaging, it is easier to detect a permeable fluid-filled reservoir than it is to detect separate fluid and permeability components. Therefore, in this analysis, we combine fluid and permeability for conventional hydrothermal plays, and supercritical fluid and producibility for supercritical plays. More information on this process is described in the following sections. We also project the 3D favorability volumes onto 2D surfaces for simplified joint interpretation, and we incorporate an uncertainty component. Uncertainty was modeled using the best approach for the dataset in question, for the datasets where we had enough information to do so. Identifying which subsurface parameters are the least resolved can help qualify current PFA results and focus future efforts in data collection. Where possible, the resulting uncertainty models/indices were weighted using the same weights applied to the respective datasets, and summed, following the PFA methodology above, but for uncertainty. There are two different versions of the Leapfrog model and associated favorability models: - v1.0: The first release in June 2023 - v2.1: The second release, with improvements made to the earthquake catalog (included additional identified events, removed duplicate events), to the temperature model (fixed a deep BHT), and to the index models (updated the seismicity-heat source index models for supercritical and EGS, and the resistivity-insulation index models for all three play types). Also uses the jet color map rather than the magma color map for improved interpretability. - v2.1.1: Updated to include v2.0 uncertainty results (see below for uncertainty model versions) There are two different versions of the associated uncertainty models: - v1.0: The first release in June 2023 - v2.0: The second release, with improvements made to the temperature and fault uncertainty models. ** Note that this submission is deprecated and that a newer submission, linked below and titled "DEEPEN Final 3D PFA Favorability Models and 2D Favorability Maps at Newberry Volcano" contains the final versions of these resources. **
d
Asset database for the Hunter subregion on 24 February 2016
data.gov.au
cloud.csiss.gmu.edu
+2more
Updated Aug 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2023). Asset database for the Hunter subregion on 24 February 2016 [Dataset]. https://data.gov.au/data/dataset/activity/a39290ac-3925-4abc-9ecb-b91e911f008f
Explore at:
Dataset updated
Aug 9, 2023
Dataset authored and provided by
Bioregional Assessment Program
Description
Abstract

The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.

Asset database for the Hunter subregion on 24 February 2016 (V2.5) supersedes the previous version of the HUN Asset database V2.4 (Asset database for the Hunter subregion on 20 November 2015, GUID: 0bbcd7f6-2d09-418c-9549-8cbd9520ce18). It contains the Asset database (HUN_asset_database_20160224.mdb), a Geodatabase version for GIS mapping purposes (HUN_asset_database_20160224_GISOnly.gdb), the draft Water Dependent Asset Register spreadsheet (BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20160224.xlsx), a data dictionary (HUN_asset_database_doc_20160224.doc), and a folder (NRM_DOC) containing documentation associated with the Water Asset Information Tool (WAIT) process as outlined below. This version should be used for Materiality Test (M2) test.

The Asset database is registered to the BA repository as an ESRI personal goedatabase (.mdb - doubling as a MS Access database) that can store, query, and manage non-spatial data while the spatial data is in a separate file geodatabase joined by AID/ElementID.

Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. A report on the WAIT process for the Hunter is included in the zip file as part of this dataset.

Elements are initially included in the preliminary assets database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet the second Materiality Test (M2). Assets meeting both Materiality Tests comprise the water dependent asset list. Descriptions of the assets identified in the Hunter subregion are found in the "AssetList" table of the database.

Assets are the spatial features used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy, as described in Appendix A of "HUN_asset_database_doc_20160224.doc ", located in this filet.

The "Element_to_Asset" table contains the relationships and identifies the elements that were grouped to create each asset.

Detailed information describing the database structure and content can be found in the document "HUN_asset_database_doc_20160224.doc" located in this file.

Some of the source data used in the compilation of this dataset is restricted.

The public version of this asset database can be accessed via the following dataset: Asset database for the Hunter subregion on 24 February 2016 Public 20170112 v02 (https://data.gov.au/data/dataset/9d16592c-543b-42d9-a1f4-0f6d70b9ffe7)

Dataset History

OBJECTID VersionID Notes Date_

1 1 Initial database. 29/08/2014

3 1.1 Update the classification for seven identical assets from Gloucester subregion 16/09/2014

4 1.2 Added in NSW GDEs from Hunter - Central Rivers GDE mapping from NSW DPI (50 635 polygons). 28/01/2015

5 1.3 New AIDs assiged to NSW GDE assets (Existing AID + 20000) to avoid duplication of AIDs assigned in other databases. 12/02/2015

6 1.4 "(1) Add 20 additional datasets required by HUN assessment project team after HUN community workshop

(2) Turn off previous GW point assets (AIDs from 7717-7810 inclusive) (3) Turn off new GW point asset (AID: 0) (4) Assets (AIDs: 8023-8026) are duplicated to 4 assets (AID: 4747,4745,4744,4743 respectively) in NAM subregion . Their AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using values from that NAM assets. (5) Asset (AID 8595) is duplicated to 1 asset ( AID 57) in GLO subregion . Its AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using values from that GLO assets. (6) 39 assets (AID from 2969 to 5040) are from NAM Asset database and their attributes were updated to use the latest attributes from NAM asset database (7)The databases, especially spatial database, were changed such as duplicated attributes fields in spatial data were removed and only ID field is kept. The user needs to join the Table Assetlist or Elementlist to the spatial data" 16/06/2015

7 2 "(1) Updated 131 new GW point assets with previous AID and some of them may include different element number due to the change of 77 FTypes requested by Hunter assessment project team

(2) Added 104 EPBC assets, which were assessed and excluded by ERIN (3) Merged 30 Darling Hardyhead assets to one (asset AID 60140) and deleted another 29 (4) Turned off 5 assets from community workshop (60358 - 60362) as they are duplicated to 5 assets from 104 EPBC excluded assets (5) Updated M2 test results (6) Asset Names (AID: 4743 and 4747) were changed as requested by Hunter assessment project team (4 lower cases to 4 upper case only). Those two assets are from Namoi asset database and their asset names may not match with original names in Namoi asset database. (7)One NSW WSP asset (AID: 60814) was added in as requested by Hunter assessment project team. The process method (without considering 1:M relation) for this asset is not robust and is different to other NSW WSP assets. It should NOT use for other subregions. (8) Queries of Find_All_Used_Assets and Find_All_WD_Assets in the asset database can be used to extract all used assts and all water dependant assts" 20/07/2015

8 2.1 "(1) There are following six assets (in Hun subregion), which is same as 6 assets in GIP subregion. Their AID, Asset Name, Group, SubGroup, Depth, Source and ListDate are using values from GIP assets. You will

not see AIDs from AID_from_HUN in whole HUN asset datable and spreadsheet anymore and you only can see AIDs from AID_from_GIP ( Actually (a) AID 11636 is GIP got from MBC (B) only AID, Asset Name and ListDate are different and changed) (2) For BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx, (a) Extracted long ( >255 characters) WD rationale for 19 assets (AIDs: 8682,9065,9073,9087,9088,9100,9102,9103,60000,60001,60792,60793,60801,60713,60739,60751,60764,60774,60812 ) in tab "Water-dependent asset register" and 37 assets (AIDs: 5040,8651,8677,8682,8650,8686,8687,8718,8762,9094,9065,9067,9073,9077,9081,9086,9087,9088,9100,9102,9103,60000,60001,60739,60742,60751,60713,60764,60771, 60774,60792,60793,60798,60801,60809,60811,60812) in tab "Asset list" in 1.30 Excel file (b) recreated draft BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx (3) Modified queries (Find_All_Asset_List and Find_Waterdependent_asset_register) for (2)(a)" 27/08/2015

9 2.2 "(1) Updated M2 results from the internal review for 386 Sociocultural assets

(2)Updated the class to Ecological/Vegetation/Habitat (potential species distribution) for assets/elements from sources of WAIT_ALA_ERIN, NSW_TSEC, NSW_DPI_Fisheries_DarlingHardyhead" 8/09/2015

10 2.3 "(1) Updated M2 results from the internal review

\* Changed "Assessment team do not say No" to "All economic assets are by definition water dependent" \* Changed "Assessment team say No" : to "These are water dependent, but excluded by the project team based on intersection with the PAE is negligible" \* Changed "Rivertyles" to "RiverStyles"" 22/09/2015

11 2.4 "(1) Updated M2 test results for 86 assets from the external review

(2) Updated asset names for two assets (AID: 8642 and 8643) required from the external review (3) Created Draft Water Dependent Asset Register file using the template V5" 20/11/2015

12 2.5 "Total number of registered water assets was increased by 1 (= +2-1) due to:

Two assets changed M2 test from "No" to "Yes" , but one asset assets changed M2 test from "Yes" to "No" from the review done by Ecologist group." 24/02/2016

Dataset Citation

Bioregional Assessment Programme (2015) Asset database for the Hunter subregion on 24 February 2016. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/a39290ac-3925-4abc-9ecb-b91e911f008f.

Dataset Ancestors

Derived From GW Element Bores with Unknown FTYPE Hunter NSW Office of Water 20150514

Derived From Travelling Stock Route Conservation Values

Derived From Spatial Threatened Species and Communities (TESC) NSW 20131129

Derived From NSW Wetlands

Derived From Climate Change Corridors Coastal North East NSW

*
f
CIS Graph Database and Model
figshare.com
pdf
Updated Sep 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanislava Gardasevic (2023). CIS Graph Database and Model [Dataset]. http://doi.org/10.6084/m9.figshare.21663401.v4
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21663401.v4
Dataset updated
Sep 6, 2023
Dataset provided by
figshare
Authors
Stanislava Gardasevic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is based on the model developed with the Ph.D. students of the Communication and Information Sciences Ph.D. program at the University of Hawaii at Manoa, intended to help new students get relevant information. The model was first presented at the iConference 2023, in a paper "Community Design of a Knowledge Graph to Support Interdisciplinary Ph.D. Students " by Stanislava Gardasevic and Rich Gazan (available at: https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/9eebcea7-06fd-4db3-b420-347883e6379e/content)The database is created in Neo4J, and the .dump file can be imported to the cloud instance of this software. The dataset (.dump) contains publically available data collected from multiple web locations and indexes of the sample of publications from the people in this domain. Except for that, it contains my (first author's) personal graph demonstrating progress through a student's program in this degree, and activities they have done while in the program. This dataset was made possible with the huge help of my collaborator, Petar Popovic, who ingested the data in the database.The model and dataset were developed while involving the end users in the design and are based on the actual information needs of a population. It is intended to allow researchers to investigate multigraph visualization of the data modeled by the said model.The knowledge graph was evaluated with CIS student population, and the study results show that it is very helpful for decision-making, information discovery, and identification of people in one's surroundings who might be good collaborators or information points. We provide the .json file containing the Neo4J Bloom perspective with styling and queries used in these evaluation sessions.
NOS CO-OPS Water Level Data, Preliminary, 1-Minute
catalog.data.gov
Updated Jun 10, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NOAA NOS COOPS (Point of Contact) (2023). NOS CO-OPS Water Level Data, Preliminary, 1-Minute [Dataset]. https://catalog.data.gov/dataset/nos-co-ops-water-level-data-preliminary-1-minute
Explore at:
Dataset updated
Jun 10, 2023
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
National Ocean Servicehttps://oceanservice.noaa.gov/
Description
This dataset has recent, preliminary (not quality-controlled), 1-minute, water level (tide) data from NOAA NOS Center for Operational Oceanographic Products and Services (CO-OPS). WARNING: These raw data have not been subjected to the National Ocean Service's quality control or quality assurance procedures and do not meet the criteria and standards of official National Ocean Service data. They are released for limited public use as preliminary data to be used only with appropriate caution. WARNING: * Queries for data MUST include stationID=, datum=, and time>=. * Queries for data USUALLY include time<=. * Queries MUST be for less than 30 days worth of data. The default time<= value corresponds to 'now'. * Different stations support different datums. Use ERDDAP's Subset web page to find out which datums a given station supports. * The data source isn't completely reliable. If your request returns no data when you think it should: * Make sure the station you specified supports the datum you specified. * Try revising the request (e.g., a different datum or a different time range). * The list of stations offering this data (or the list of datums) may be incorrect. * Sometimes a station or the entire data service is unavailable. Wait a while and try again.
d
Mobile Location Data | Saudi Arabia | +15M Unique Devices | +5M Daily Users...
datarade.ai
.json, .csv, .xls
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quadrant (2025). Mobile Location Data | Saudi Arabia | +15M Unique Devices | +5M Daily Users | +5B Events / Month [Dataset]. https://datarade.ai/data-products/mobile-location-data-saudi-arabia-15m-unique-devices-quadrant-9de1
Explore at:
.json, .csv, .xlsAvailable download formats
Dataset updated
Mar 27, 2025
Dataset authored and provided by
Quadrant
Area covered
Saudi Arabia
Description
Quadrant provides Insightful, accurate, and reliable mobile location data.

Our privacy-first mobile location data unveils hidden patterns and opportunities, provides actionable insights, and fuels data-driven decision-making at the world's biggest companies.

These companies rely on our privacy-first Mobile Location and Points-of-Interest Data to unveil hidden patterns and opportunities, provide actionable insights, and fuel data-driven decision-making. They build better AI models, uncover business insights, and enable location-based services using our robust and reliable real-world data.

We conduct stringent evaluations on data providers to ensure authenticity and quality. Our proprietary algorithms detect, and cleanse corrupted and duplicated data points – allowing you to leverage our datasets rapidly with minimal processing or cleaning. During the ingestion process, our proprietary Data Filtering Algorithms remove events based on a number of both qualitative factors, as well as latency and other integrity variables to provide more efficient data delivery. The deduplicating algorithm focuses on a combination of four important attributes: Device ID, Latitude, Longitude, and Timestamp. This algorithm scours our data and identifies rows that contain the same combination of these four attributes. Post-identification, it retains a single copy and eliminates duplicate values to ensure our customers only receive complete and unique datasets.

We actively identify overlapping values at the provider level to determine the value each offers. Our data science team has developed a sophisticated overlap analysis model that helps us maintain a high-quality data feed by qualifying providers based on unique data values rather than volumes alone – measures that provide significant benefit to our end-use partners.

Quadrant mobility data contains all standard attributes such as Device ID, Latitude, Longitude, Timestamp, Horizontal Accuracy, and IP Address, and non-standard attributes such as Geohash and H3. In addition, we have historical data available back through 2022.

Through our in-house data science team, we offer sophisticated technical documentation, location data algorithms, and queries that help data buyers get a head start on their analyses. Our goal is to provide you with data that is “fit for purpose”.
H
NOAA National Water Model Reanalysis Data at RENCI
hydroshare.org
beta.hydroshare.org
+2more
zip
Updated Oct 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Johnson; David Blodgett (2023). NOAA National Water Model Reanalysis Data at RENCI [Dataset]. http://doi.org/10.4211/hs.a1e329ad20654e72b7b423f991bf9251
Explore at:
zip(3.5 KB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.a1e329ad20654e72b7b423f991bf9251
Dataset updated
Oct 5, 2023
Dataset provided by
HydroShare
Authors
Mike Johnson; David Blodgett
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1993 - Dec 31, 2018
Area covered
Description
This data release provides the reanalysis streamflow data from versions 1.2, 2.0, and 2.1 of the National Water Model structured for timeseries extraction. The impact of this is that user can query time series for a given NHDPlusV2 COMID without downloading the hourly CONUS files and extracting the sample of relevant values.

The data is hosted on the RENCI THREDDS Data Server and is accessible via OPeNDAP at the follwoing URLs:

Version 1.2 (https://thredds.hydroshare.org/thredds/catalog/nwm/retrospective/catalog.html?dataset=NWM_Retrospective/nwm_retro_full.ncml) - Spans 1993-01-01 00:00:00 to 2017-12-31 23:00:00 - Contains 219,144 hourly time steps for - 2,729,077 NHD reaches

Version 2.0 (https://thredds.hydroshare.org/thredds/catalog/nwm/retrospective/catalog.html?dataset=NWM_Retrospective/nwm_v2_retro_full.ncml) - Spans 1993-01-01 00:00:00 to 2018-12-31 00:00:00 - Contains 227,903 hourly time steps for - 2,729,076 NHD reaches

Version 2.1 (https://cida.usgs.gov/thredds/catalog/demo/morethredds/nwm/nwm_v21_retro_full.ncml) - Spans 1979-02-02 18:00:00 to 2020-12-31 00:00:00 - Contains 227,903 hourly time steps for - 2,729,076 NHD reaches

Raw Data (https://registry.opendata.aws/nwm-archive/) - 227,000+ hourly netCDF files (depending on version)

DDS

The data description structure (DDS) can be viewed at the NcML page for each respective resource (linked above). More broadly each resource includes:

A 1D time array - hours since 1970-01-01 00:00

A 1D latitude array - coordinate (Y) information

A 1D longitude array - coordinate (X) information WGS84

A 1D feature_id array - NHDPlus V2 COMID (NWM forecast ID)

A 2D streamflow array - Q (cms) [feature_id, time]

R package

The nwmTools R package provides easier interaction with the OPeNDAP resources. Package documentation can be found here and the GitHub repository here.

Collaborators:

Mike Johnson, David Blodgett

Support:

This effort is supported by the Consortium of Universities for the Advancement of Hydrologic Science, Inc. under the HydroInformatics Fellowship. See program here

Publications

J.M. Johnson, David L. Blodgett, Keith C. Clarke, Jon Pollack. (2020). "Restructuring and serving web-accessible streamflow data from the NOAA National Water Model historic simulations". Nature Scienﬁc Data. (In Review)
d
Data from: DEEPEN: Final 3D PFA Favorability Models and 2D Favorability Maps...
catalog.data.gov
gdr.openei.org
+2more
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2025). DEEPEN: Final 3D PFA Favorability Models and 2D Favorability Maps at Newberry Volcano [Dataset]. https://catalog.data.gov/dataset/deepen-final-3d-pfa-favorability-models-and-2d-favorability-maps-at-newberry-volcano-2a96b
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
National Renewable Energy Laboratory
Area covered
Newberry Volcano
Description
Part of the DEEPEN (DE-risking Exploration of geothermal Plays in magmatic ENvironments) project involved developing and testing a methodology for a 3D play fairway analysis (PFA) for multiple play types (conventional hydrothermal, superhot EGS, and supercritical). This was tested using new and existing geoscientific exploration datasets at Newberry Volcano. This GDR submission includes images, data, and models related to the 3D favorability and uncertainty models and the 2D favorability and uncertainty maps. The DEEPEN PFA Methodology, detailed in the journal article below, is based on the method proposed by Poux & O'brien (2020), which uses the Leapfrog Geothermal software with the Edge extension to conduct PFA in 3D. This method uses all available data to build a 3D geodata model which can be broken down into smaller blocks and analyzed with advanced geostatistical methods. Each data set is imported into a 3D model in Leapfrog and divided into smaller blocks. Conditional queries can then be used to assign each block an index value which conditionally ranks each block's favorability, from 0-5 with 5 being most favorable, for each model (e.g., lithologic, seismic, magnetic, structural). The values between 0-5 assigned to each block are referred to as index values. The final step of the process is to combine all the index models to create a favorability index. This involves multiplying each index model by a given weight and then summing the resulting values. The DEEPEN PFA Methodology follows this approach, but split up by the specific geologic components of each play type. These components are defined as follows for each magmatic play type: 1. Conventional hydrothermal plays in magmatic environments: Heat, fluid, and permeability 2. Superhot EGS plays: Heat, thermal insulation, and producibility (the ability to create and sustain fractures suitable for and EGS reservoir) 3. Supercritical plays: Heat, supercritical fluid, pressure seal, and producibility (the proper permeability and pressure conditions to allow production of supercritical fluid) More information on these components and their development can be found in Kolker et al., (2022). For the purposes of subsurface imaging, it is easier to detect a permeable fluid-filled reservoir than it is to detect separate fluid and permeability components. Therefore, in this analysis, we combine fluid and permeability for conventional hydrothermal plays, and supercritical fluid and producibility for supercritical plays. We also project the 3D favorability volumes onto 2D surfaces for simplified joint interpretation, and we incorporate an uncertainty component. Uncertainty was modeled using the best approach for the dataset in question, for the datasets where we had enough information to do so. Identifying which subsurface parameters are the least resolved can help qualify current PFA results and focus future efforts in data collection. Where possible, the resulting uncertainty models/indices were weighted using the same weights applied to the respective datasets, and summed, following the PFA methodology above, but for uncertainty.

Facebook

Twitter

Click to copy link

Link copied

Cite

Maurice Vanderfeesten; Maurice Vanderfeesten; Linda Hasse; Linda Hasse (2020). Text Analyses of Survey Data on "Mapping Research Output to the Sustainable Development Goals (SDGs)" [Dataset]. http://doi.org/10.5281/zenodo.3832090

Text Analyses of Survey Data on "Mapping Research Output to the Sustainable Development Goals (SDGs)"

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3832090

Dataset updated

May 20, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Maurice Vanderfeesten; Maurice Vanderfeesten; Linda Hasse; Linda Hasse

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This package contains data on five text analysis types (term extraction, contract analysis, topic modeling, network mapping), based on the survey data where researchers selected research output that are related to the 17 Sustainable Development Goals (SDGs). This is used as input to improve the current SDG classification model v4.0 to v5.0

Sustainable Development Goals are the 17 global challenges set by the United Nations. Within each of the goals specific targets and indicators are mentioned to monitor the progress of reaching those goals by 2030. In an effort to capture how research is contributing to move the needle on those challenges, we earlier have made an initial classification model than enables to quickly identify what research output is related to what SDG. (This Aurora SDG dashboard is the initial outcome as proof of practice.)

The initiative started from the Aurora Universities Network in 2017, in the working group "Societal Impact and Relevance of Research", to investigate and to make visible 1. what research is done that are relevant to topics or challenges that live in society (for the proof of practice this has been scoped down to the SDGs), and 2. what the effect or impact is of implementing those research outcomes to those societal challenges (this also have been scoped down to research output being cited in policy documents from national and local governments an NGO's).

Context of this dataset | classification model improvement workflow

The classification model we have used are 17 different search queries on the Scopus database.

SDG search queries version 4.0 (SQv4) have been created, Published here:
- Search Queries for "Mapping Research Output to the Sustainable Development Goals (SDGs)" v4.0 by Aurora Universities Network (AUR) doi:10.5281/zenodo.3817443
A survey has been distributed to senior researchers to test the robustness of SQv4. Published here:
- Survey data of "Mapping Research output to the Sustainable Development Goals SDGs" by Aurora Universities Network (AUR) doi:10.5281/zenodo.3798385
This text analysis has been made as one of the inputs to improve the classification model. Published here:
- Text Analyses of Survey Data on "Mapping Research Output to the Sustainable Development Goals SDGs" by Aurora Universities Network (AUR) doi:10.5281/zenodo.3832090
Improved SDG search queries version 5.0 (SQv5) have been created, Published here:
- Search Queries for "Mapping Research Output to the Sustainable Development Goals (SDGs)" v5.0 by Aurora Universities Network (AUR) doi:10.5281/zenodo.3817445

Methods used to do the text analysis

Term Extraction: after text normalisation (stemming, etc) we extracted 2 terms in bigrams and trigrams that co-occurred the most per document, in the title, abstract and keyword
Contrast analysis: the co-occurring terms in publications (title, abstract, keywords), of the papers that respondents have indicated relate to this SDG (y-axis: True), and that have been rejected (x-axis: False). In the top left you'll see term co-occurrences that a clearly relate to this SDG. The bottom-right are terms that are appear in papers that have been rejected for this SDG. The top-right terms appear frequently in both and cannot be used to discriminate between the two groups.
Network map: This diagram shows the cluster-network of terms co-occurring in the publications related to this SDG, selected by the respondents (accepted publications only).
Topic model: This diagram shows the topics, and the related terms that make up that topic. The number of topics is related to the number of of targets of this SDG.
Contingency matrix: This diagram shows the top 10 of co-occurring terms that correlate the most.

Software used to do the text analyses

CorTexT: The CorTexT Platform is the digital platform of LISIS Unit and a project launched and sustained by IFRIS and INRAE. This platform aims at empowering open research and studies in humanities about the dynamic of science, technology, innovation and knowledge production.

Resource with interactive visualisations

Based on the text analysis data we have created a website that puts all the SDG interactive diagrams together. For you to scrall through. https://sites.google.com/vu.nl/sdg-survey-analysis-results/

Data set content

In the dataset root you'll find the following folders and files:

/sdg01-17/
- This contains the text analysis for all the individual SDG surveys.
/methods/
- This contains the step-by-step explanations of the text analysis methods using Cortext.
/images/
- images of the results used in this README.md.
LICENSE.md
- terms and conditions for reusing this data.
README.md
- description of the dataset; each subfolders contains a README.md file to futher describe the content of each sub-folder.

Inside an /sdg01-17/-folder you'll find the following:

This contains the step-by-step explanations of the text analysis methods using Cortext.
/sdg01-17/sdg04-sdg-survey-selected-publications-combined.db
- his contains the title, abstract, keywords, fo the publications in the survey, including the and accept or rejection status and the number of respondents
/sdg01-17/sdg04-sdg-survey-selected-publications-combined-accepted-accepted-custom-filtered.db
- same as above, but only the accepted papers
/sdg01-17/extracted-terms-list-top1000.csv
- the aggregated list of co-occuring terms (bigrams and trigrams) extracted per paper.
/sdg01-17/contrast-analysis/
- This contains the data and visualisation of the terms appearing in papers that have been accepted (true) and rejected (false) to be relating to this SDG.
/sdg01-17/topic-modelling/
- This contains the data and visualisation of the terms clustered in the same number of topics as there are 'targets' within that SDG.
/sdg01-17/network-mapping/
- This contains the data and visualisation of the terms clustered in co-occuring proximation of appearance in papers
/sdg01-17/contingency-matrix/
- This contains the data and visualisation of the top 10 terms co-occuring

note: the .csv files are actually tab-separated.

Contribute and improve the SDG Search Queries

We welcome you to join the Github community and to fork, branch, improve and make a pull request to add your improvements to the new version of the SDG queries. https://github.com/Aurora-Network-Global/sdg-queries

Clear search

Close search

Google apps

Main menu

Text Analyses of Survey Data on "Mapping Research Output to the Sustainable...

Data from: SHREC'09 track: querying with partial models

COKI Language Dataset

MAST STScI CAOM and ObsCore TAP service - Dataset - B2FIND

Supplementary data: "Open modeling of electricity and heat demand curves for...

Database Development and Management Tools Software Market Research Report...

Database Development and Management Tools Software Market Outlook

2023 Irrigated Lands for the Mountain Home Plateau: Machine Learning...

InQuartik-ASSIGNEE QUERY: Patent Owner Data Worldwide with 50+ Years of...

SHREC'14 Track: Large Scale Comprehensive 3D Shape Retrieval

Human Instructions - Multilingual (wikiHow)

The Human Instructions Dataset - Multilingual

Updated JSON files for English at this other Kaggle repository

Available in 16 Different Languages Extracted from wikiHow

Overview

Quick-Start: Instruction Extractor and Simplifier Script

The Available Languages

Querying the Dataset

The files on your computer

Dataset: The files on your computer.

Content

FILES TABLE

FILESLINES TABLE

Example Gutenberg data

Understanding the Influence of Parameter Value Uncertainty on Climate Model...

MetaMath QA

MetaMath QA

Mathematical Questions for Large Language Models

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Data Dictionary

Preparing data for analysis

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Data from: DEEPEN 3D PFA Favorability Models and 2D Favorability Maps at...

Asset database for the Hunter subregion on 24 February 2016

Abstract

Dataset History

Dataset Citation

Dataset Ancestors

CIS Graph Database and Model

NOS CO-OPS Water Level Data, Preliminary, 1-Minute

Mobile Location Data | Saudi Arabia | +15M Unique Devices | +5M Daily Users...

NOAA National Water Model Reanalysis Data at RENCI

DDS

R package

Collaborators:

Support:

Publications

Data from: DEEPEN: Final 3D PFA Favorability Models and 2D Favorability Maps...

Text Analyses of Survey Data on "Mapping Research Output to the Sustainable Development Goals (SDGs)"