100+ datasets found
  1. Text Analyses of Survey Data on "Mapping Research Output to the Sustainable...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated May 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maurice Vanderfeesten; Maurice Vanderfeesten; Linda Hasse; Linda Hasse (2020). Text Analyses of Survey Data on "Mapping Research Output to the Sustainable Development Goals (SDGs)" [Dataset]. http://doi.org/10.5281/zenodo.3832090
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 20, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maurice Vanderfeesten; Maurice Vanderfeesten; Linda Hasse; Linda Hasse
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This package contains data on five text analysis types (term extraction, contract analysis, topic modeling, network mapping), based on the survey data where researchers selected research output that are related to the 17 Sustainable Development Goals (SDGs). This is used as input to improve the current SDG classification model v4.0 to v5.0

    Sustainable Development Goals are the 17 global challenges set by the United Nations. Within each of the goals specific targets and indicators are mentioned to monitor the progress of reaching those goals by 2030. In an effort to capture how research is contributing to move the needle on those challenges, we earlier have made an initial classification model than enables to quickly identify what research output is related to what SDG. (This Aurora SDG dashboard is the initial outcome as proof of practice.)

    The initiative started from the Aurora Universities Network in 2017, in the working group "Societal Impact and Relevance of Research", to investigate and to make visible 1. what research is done that are relevant to topics or challenges that live in society (for the proof of practice this has been scoped down to the SDGs), and 2. what the effect or impact is of implementing those research outcomes to those societal challenges (this also have been scoped down to research output being cited in policy documents from national and local governments an NGO's).

    Context of this dataset | classification model improvement workflow

    The classification model we have used are 17 different search queries on the Scopus database.

    Methods used to do the text analysis

    1. Term Extraction: after text normalisation (stemming, etc) we extracted 2 terms in bigrams and trigrams that co-occurred the most per document, in the title, abstract and keyword
    2. Contrast analysis: the co-occurring terms in publications (title, abstract, keywords), of the papers that respondents have indicated relate to this SDG (y-axis: True), and that have been rejected (x-axis: False). In the top left you'll see term co-occurrences that a clearly relate to this SDG. The bottom-right are terms that are appear in papers that have been rejected for this SDG. The top-right terms appear frequently in both and cannot be used to discriminate between the two groups.
    3. Network map: This diagram shows the cluster-network of terms co-occurring in the publications related to this SDG, selected by the respondents (accepted publications only).
    4. Topic model: This diagram shows the topics, and the related terms that make up that topic. The number of topics is related to the number of of targets of this SDG.
    5. Contingency matrix: This diagram shows the top 10 of co-occurring terms that correlate the most.

    Software used to do the text analyses

    CorTexT: The CorTexT Platform is the digital platform of LISIS Unit and a project launched and sustained by IFRIS and INRAE. This platform aims at empowering open research and studies in humanities about the dynamic of science, technology, innovation and knowledge production.

    Resource with interactive visualisations

    Based on the text analysis data we have created a website that puts all the SDG interactive diagrams together. For you to scrall through. https://sites.google.com/vu.nl/sdg-survey-analysis-results/

    Data set content

    In the dataset root you'll find the following folders and files:

    • /sdg01-17/
      • This contains the text analysis for all the individual SDG surveys.
    • /methods/
      • This contains the step-by-step explanations of the text analysis methods using Cortext.
    • /images/
      • images of the results used in this README.md.
    • LICENSE.md
      • terms and conditions for reusing this data.
    • README.md
      • description of the dataset; each subfolders contains a README.md file to futher describe the content of each sub-folder.

    Inside an /sdg01-17/-folder you'll find the following:

    • This contains the step-by-step explanations of the text analysis methods using Cortext.
    • /sdg01-17/sdg04-sdg-survey-selected-publications-combined.db
      • his contains the title, abstract, keywords, fo the publications in the survey, including the and accept or rejection status and the number of respondents
    • /sdg01-17/sdg04-sdg-survey-selected-publications-combined-accepted-accepted-custom-filtered.db
      • same as above, but only the accepted papers
    • /sdg01-17/extracted-terms-list-top1000.csv
      • the aggregated list of co-occuring terms (bigrams and trigrams) extracted per paper.
    • /sdg01-17/contrast-analysis/
      • This contains the data and visualisation of the terms appearing in papers that have been accepted (true) and rejected (false) to be relating to this SDG.
    • /sdg01-17/topic-modelling/
      • This contains the data and visualisation of the terms clustered in the same number of topics as there are 'targets' within that SDG.
    • /sdg01-17/network-mapping/
      • This contains the data and visualisation of the terms clustered in co-occuring proximation of appearance in papers
    • /sdg01-17/contingency-matrix/
      • This contains the data and visualisation of the top 10 terms co-occuring

    note: the .csv files are actually tab-separated.

    Contribute and improve the SDG Search Queries

    We welcome you to join the Github community and to fork, branch, improve and make a pull request to add your improvements to the new version of the SDG queries. https://github.com/Aurora-Network-Global/sdg-queries

  2. Data from: SHREC'09 track: querying with partial models

    • catalog.data.gov
    • data.nist.gov
    • +1more
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). SHREC'09 track: querying with partial models [Dataset]. https://catalog.data.gov/dataset/shrec09-track-querying-with-partial-models-0b8d2
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    There are two objectives of this shape retrieval contest: a) To evaluate partial similarity between query and target objects and retrieve complete 3D models that are relevant to a partial query object. b) To retrieve 3D models those are relevant to a query depth map. This task corresponds to a real life scenario where the query is a 3D range scan of an object acquired from an arbitrary view direction. The algorithm should retrieve the relevant 3D objects from a database. Task description In response to a given set of queries, the task is to evaluate similarity scores with the target models and return an ordered ranked list along with the similarity scores for each query. The set of queries either consists of partial 3D models or of range images. The participants may present ranked lists for either of the query sets or both. There is no obligation to submit rank lists for both of the query sets. Dataset The first query set consists of 20 3D partial models which are obtained by cutting parts from complete models. The objective is to retrieve the models which the query part may belong to. The file format to represent the partial query models is the ASCII Object File Format (.off). This second query set is composed of 20 range images, which are acquired by capturing range data of 20 models from arbitrary view directions. The range images are captured using a desktop 3D scanner. The file format is in the ASCII Object File Format (.off) representing the scan in a triangular mesh. The target database is the same for both of the query sets and it contains 720 complete 3D models, which are categorized into 40 classes. In each class there are 18 models. The file format to represent the 3D models is the ASCII Object File Format (*.off).D Models, Classification files, Evaluation software, Images Paper: Dutagaci, Helin, Godil, Afzal, et al. "SHREC'09 track: querying with partial models." Proceedings of the 2nd Eurographics conference on 3D Object Retrieval. Eurographics Association, 2009. https://doi.org/10.5555/2381128.2381144

  3. COKI Language Dataset

    • zenodo.org
    application/gzip, csv
    Updated Jun 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon (2022). COKI Language Dataset [Dataset]. http://doi.org/10.5281/zenodo.6636625
    Explore at:
    application/gzip, csvAvailable download formats
    Dataset updated
    Jun 16, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    James P. Diprose; James P. Diprose; Cameron Neylon; Cameron Neylon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The COKI Language Dataset contains predictions for 122 million academic publications. The dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

    Methodology
    A subset of the COKI Academic Observatory Dataset, which is produced by the Academic Observatory Workflows codebase [1], was extracted and converted to CSV with Bigquery and downloaded to a virtual machine. The subset consists of all publications with DOIs in our dataset, including each publication’s title and abstract from both Crossref Metadata and Microsoft Academic Graph. The CSV files were then processed with a Python script. The titles and abstracts for each record were pre-processed, concatenated together and analysed with fastText. The titles and abstracts from Crossref Metadata were used first, with the MAG titles and abstracts serving as a fallback when the Crossref Metadata information was empty. Language was predicted for each publication using the fastText lid.176.bin language identification model [2]. fastText was chosen because of its high accuracy and fast runtime speed [3]. The final output dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.

    Query or Download
    The data is publicly accessible in BigQuery in the following two tables:

    When you make queries on these tables, make sure that you are in your own Google Cloud project, otherwise the queries will fail.

    See the COKI Language Detection README for instructions on how to download the data from Zenodo and load it into BigQuery.

    Code
    The code that generated this dataset, the BigQuery schemas and instructions for loading the data into BigQuery can be found here: https://github.com/The-Academic-Observatory/coki-language

    License
    COKI Language Dataset © 2022 by Curtin University is licenced under CC BY 4.0.

    Attributions
    This work contains information from:

    References
    [1] https://doi.org/10.5281/zenodo.6366695
    [2] https://fasttext.cc/docs/en/language-identification.html
    [3] https://modelpredict.com/language-identification-survey

  4. e

    MAST STScI CAOM and ObsCore TAP service - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Dec 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). MAST STScI CAOM and ObsCore TAP service - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/428b439b-4eff-589c-b4e2-930c806134b6
    Explore at:
    Dataset updated
    Dec 20, 2022
    Description

    The MAST Archive at STScI TAP end point for observational data, saved in the Common Archive Data Model format and made available through the ObsCore limited view. The Table Access Protocol (TAP) lets you execute queries against our database tables, and inspect various metadata. Upload is not currently supported. Missions and projects with data available through the CAOMTAP service include: BEFS, EUVE, FUSE, GALEX, HLA, HST, HUT, IUE, JWST, K2, KEPLER, PS1 (PanSTARRS 1) Data Release 2, SPITZER_SHA, SWIFT, TESS, TUES, WUPPE.

  5. Supplementary data: "Open modeling of electricity and heat demand curves for...

    • zenodo.org
    bin, zip
    Updated Sep 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clara Büttner; Jonathan Amme; Jonathan Amme; Julian Endres; Julian Endres; Aadit Malla; Aadit Malla; Birgit Schachler; Birgit Schachler; Ilka Cußmann; Clara Büttner; Ilka Cußmann (2022). Supplementary data: "Open modeling of electricity and heat demand curves for all residential buildings in Germany" [Dataset]. http://doi.org/10.5281/zenodo.6771218
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Sep 14, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Clara Büttner; Jonathan Amme; Jonathan Amme; Julian Endres; Julian Endres; Aadit Malla; Aadit Malla; Birgit Schachler; Birgit Schachler; Ilka Cußmann; Clara Büttner; Ilka Cußmann
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Area covered
    Germany
    Description

    THIS VERSION IS OUTDATED, PLEASE CHECK OUT THE LAST VERSION HERE: https://zenodo.org/record/6771217

    _

    This repository contains result data for the paper "Open modeling of electricity and heat demand curves for all residential buildings in Germany".

    The published data includes residential electricity and heat demand profiles for every building in Germany. It was created with the open source tool eGon-data within the research project eGon. All input data sets as well as the code are available under open source licenses.

    Files

    • The profile data is stored as PostgreSQL database in attached backup file. The data can be restored by using e.g. pgAdmin or via PostgreSQL's pg_restore command. See section Database structure below for details.
    • Unpack the zip files.
    • The directory scripts/ contains example scripts to obtain electricity and heat profiles.
    • The directory additional_data/ contains TRY climate zones and weather data which can be used to extract heat profiles.
    • See section Example scripts below for details.

    Database structure

    After restoring the backup file, the data is stored in different schemas: society, openstreetmap and demand. Different tables have to be combined to create the final demand time series for heat and electricity. In the following, the tables and the matching methods are described.

    The schema society includes data from Census 2011 on population in 100m x 100m cells ('Census cells'). The cells are georeferenced and have a unique id.

    Schema: society

    • destatis_zensus_population_per_ha_inside_germany
      National census in Germany in 2011 with the bounds on Germanys borders.
      • id: Unique identifier
      • grid_id: Grid number of Census
      • population: Number of registred residents
      • geom_point: Geometry centroid (CRS: ERTS89)
      • geom: Geometry (CRS: ERTS89)

    Schema: openstreetmap

    The schema openstreetmap includes data on residential buildings. All buildings hold an internal building_id. All residential buildings extracted from openstreetmap are stored in openstreetmap.osm_buildings_residential including osm_id and internal building_id. Additional, synthetic buildings are stored in openstreetmap.osm_buildings_synthetic.

    • osm_buildings_residential: Filtered list of residential buildings from OpenStreetMap - (c) OpenStreetMap contributors
      • id: Building id
      • osm_id: Openstreetmap id
      • amenity: Amenity in building
      • building: Type of building
      • name: Name of the building
      • geom: Polygon of building (CRS: ERTS89)
      • area: Surface area of building
      • geom_point: Centroid of building (CRS: ERTS89)
      • tags: Opensteetmap tags
    • osm_buildings_synthetic: List of generated synthetic buildings
      • id: Building id
      • geom: Polygon of building (CRS: ERTS89)
      • geom_point: Centroid of building (CRS: ERTS89)
      • grid_id: Census grid id (reference to: society.destatis_zensus_population_per_ha_inside_germany.grid_id)
      • cell_id: Census cell id (reference to: society.destatis_zensus_population_per_ha_inside_germany.id)
      • building: Building type (residential)
      • area: Surface area

    Schema: demand

    With the profile_ids in egon_household_electricity_profile_of_buildings, specific profiles from iee_household_load_profiles are mapped to all residential buildings. The profiles need to be scaled therafter by their annual sum and the corresponding scaling factors, which can be found in egon_household_electricity_profile_in_census_cell and matched per census cell id.

    • egon_household_electricity_profile_in_census_cell: Mapping table for household electricity profiles to census cell including scaling factors for two scenarios (eGon2035, eGon100RE) .
      • cell_id: Census cell id (reference to: society.destatis_zensus_population_per_ha_inside_germany.id)
      • grid_id: Census grid id
      • cell_profile_ids: Household profile ids
      • nuts3: NUTS 3 code
      • nuts1: NUTS 1 code
      • factor_2035: Scaling factor for scenario eGon2035
      • factor_2050: Scaling factor for scenario eGon100RE
    • iee_household_load_profiles: 100.000 annual profiles in hourly resolution of electricity demand of private households for different household types (singles, couples, other) with varying number of elderly and children. The profiles were created using a bottom-up load profile generator by Fraunhofer IEE developed in the Bachelor's thesis "Auswirkungen verschiedener Haushaltslastprofile auf PV-Batterie-Systeme" by Jonas Haack, Fachhochschule Flensburg, December 2012. The columns are named as follows: "
    • egon_household_electricity_profile_of_building: Mapping table for household electricity profiles to buildings via internal building_id and corresponding census cell_id.
      • id: Unique identifier
      • building_id: Building id (reference to: osm_buildings_residential.id, osm_buildings_synthetic.id)
      • cell_id: Census cell id (reference to: society.destatis_zensus_population_per_ha_inside_germany.id)
      • profile_id: Household profile id (reference to: iee_household_load_profiles.type)

    Heat demand profiles per building can be created by combining the tables egon_peta_heat, heat_idp_pool and heat_timeseries_selected_profiles. In addition, weather data (e.g. from ERA5, located in additional_data/) is needed to distribute the annual heat demands to single days. This is included in the example script, the usage is described below.

    • egon_peta_heat: Table for annual heat demands of residential and service sector per Census cell
      • demand: Annual heat demand in MWh
      • id: Unique identifier
      • scenario: Scenario name (either eGon2035 or eGon100RE)
      • sector: Demand sector (either 'residential' or 'service')
      • zensus_population_id: id of the Census cell (reference to: society.destatis_zensus_population_per_ha_inside_germany.id)
    • heat_idp_pool: About 460,000 inidvidual daily heat demand profiles per building including the temeprature class and building type.
      • house: Single- or multi-family house
      • idp: Normalized demand timeseries for one day (24 hours)
      • index: Unique identifier
      • temperature_class: Number of corresponding temperature class
    • heat_timeseries_selected_profiles: Mapping table for household heat profiles to buildings per day via internal building_id and corresponding census cell_id.
      • ID: Unique identifier
      • building_id: Id of the corresponding building (reference to: osm_buildings_residential.id, osm_buildings_synthetic.id)
      • selected_ipd_profiles: Array of selected profiles per day (values in array refer to: heat_idp_pool.index)
      • zensus_population_id: id of corresponding Census cell (reference to: society.destatis_zensus_population_per_ha_inside_germany.id)

    Weather data and the used climate zones are not included in the database. They are stored in files which are part of the additional_data/ folder. In this folder, you find the following data sets:

    Example queries

    Electricity profiles: The demand profiles for residential buildings can be obtained using the tables stored in the demand schema. To extract electricity demand profiles, the following tables have to be combined:

    • egon_household_electricity_profile_in_census_cell
    • iee_household_load_profiles
    • egon_household_electricity_profile_of_building

    Example script to obtain the electrical demand timeseries for 1 specific building for the eGon2035 scenario:

  6. Database Development and Management Tools Software Market Research Report...

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Database Development and Management Tools Software Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/database-development-and-management-tools-software-market-global-industry-analysis
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Database Development and Management Tools Software Market Outlook




    According to our latest research, the global database development and management tools software market size reached USD 15.8 billion in 2024, reflecting robust demand across diverse sectors. The market is anticipated to expand at a CAGR of 13.2% during the forecast period, propelling the market to an estimated USD 44.2 billion by 2033. This impressive growth is driven by the escalating need for efficient data management, the proliferation of cloud-based solutions, and the increasing complexity of enterprise data environments. As organizations worldwide continue to digitize their operations and harness big data analytics, the demand for advanced database development and management tools software is set to surge.




    One of the primary growth factors for the database development and management tools software market is the exponential increase in data volumes generated by businesses, governments, and individuals alike. The digital transformation wave sweeping across industries necessitates robust solutions for storing, organizing, and retrieving vast datasets with high reliability and speed. Organizations are increasingly leveraging data-driven insights to enhance decision-making, optimize operations, and personalize customer experiences. This reliance on data has compelled enterprises to invest in sophisticated database development and management tools that can handle complex queries, streamline data modeling, and ensure data integrity. As a result, both established enterprises and emerging startups are prioritizing investments in this market, further fueling its expansion.




    Another significant driver of market growth is the rapid adoption of cloud computing technologies. Cloud-based database management solutions offer unparalleled scalability, flexibility, and cost-effectiveness compared to traditional on-premises systems. With organizations seeking to minimize IT infrastructure costs and improve accessibility, cloud deployment models are gaining substantial traction. This shift is particularly pronounced among small and medium enterprises (SMEs), which benefit from the reduced upfront investment and operational agility provided by cloud solutions. Additionally, the integration of artificial intelligence and machine learning capabilities into database tools is enabling automated performance monitoring, predictive maintenance, and advanced security management, further enhancing the value proposition of these solutions.




    The growing emphasis on data security and regulatory compliance is also shaping the trajectory of the database development and management tools software market. With the rising incidence of cyberattacks and stringent data protection regulations such as GDPR, HIPAA, and CCPA, organizations are under pressure to safeguard sensitive information and ensure compliance. Advanced database management tools now incorporate robust security features, including encryption, access controls, and real-time threat detection, to address these concerns. Vendors are continuously innovating to provide end-to-end security management and automated compliance reporting, making their solutions indispensable for businesses operating in highly regulated industries such as BFSI, healthcare, and government.




    Regionally, North America continues to dominate the market, accounting for the largest revenue share in 2024, followed closely by Europe and the Asia Pacific. The presence of leading technology providers, early adoption of digital technologies, and a strong focus on innovation contribute to North America's leadership. Meanwhile, the Asia Pacific region is experiencing the fastest growth, driven by rapid industrialization, increasing IT investments, and the proliferation of cloud-based services in emerging economies such as China and India. Europe maintains a steady growth trajectory, supported by stringent data protection regulations and a mature enterprise IT landscape. Latin America and the Middle East & Africa are also witnessing increased adoption, albeit at a slower pace, as organizations in these regions gradually embrace digital transformation.



  7. a

    2023 Irrigated Lands for the Mountain Home Plateau: Machine Learning...

    • data-idwr.hub.arcgis.com
    • gis-idaho.hub.arcgis.com
    Updated May 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Idaho Department of Water Resources (2024). 2023 Irrigated Lands for the Mountain Home Plateau: Machine Learning Generated [Dataset]. https://data-idwr.hub.arcgis.com/documents/b5c6474cb4ae459480bb804127c4831e
    Explore at:
    Dataset updated
    May 15, 2024
    Dataset authored and provided by
    Idaho Department of Water Resources
    Description

    This raster file represents land within the Mountain Home study boundary classified as either “irrigated” with a cell value of 1 or “non-irrigated” with a cell value of 0 at a 10-meter spatial resolution. These classifications were determined at the pixel level by use of Random Forest, a supervised machine learning algorithm. Classification models often employ Random Forest due to its accuracy and efficiency at labeling large spatial datasets. To build a Random Forest model and supervise the learning process, IDWR staff create pre-labeled data, or training points, which are used by the algorithm to construct decision trees that will be later used on unseen data. Model accuracy is determined using a subset of the training points, otherwise known as a validation dataset. Several satellite-based input datasets are made available to the Random Forest model, which aid in distinguishing characteristics of irrigated lands. These characteristics allow patterns to be established by the model, e.g., high NDVI during summer months for cultivated crops, or consistently low ET for dryland areas. Mountain Home Irrigated Lands 2023 employed the following input datasets: US Geological Survey (USGS) products, including Landsat 8/9 and 10-meter 3DEP DEM, and European Space Agency (ESA) Copernicus products, including Harmonized Sentinel-2 and Global 30m Height Above Nearest Drainage (HAND). For the creation of manually labeled training points, IDWR staff accessed the following datasets: NDVI derived from Landsat 8/9, Sentinel-2 CIR imagery, US Department of Agriculture National Agricultural Statistics Service (USDA NASS) Cropland Data Layer, Active Water Rights Place of Use data from IDWR, and USDA’s National Agriculture Imagery Program (NAIP) imagery. All datasets were available for the current year of interest (2023). The published Mountain Home Irrigated Lands 2023 land classification raster was generated after four model runs, where at each iteration, IDWR staff added or removed training points to help improve results. Early model runs showed poor results in riparian areas near the Snake River, concentrated animal feeding operations (CAFOs), and non-irrigated areas at higher elevations. These issues were resolved after several model runs in combination with post-processing masks. Masks used include Fish and Wildlife Service’s National Wetlands Inventory (FWS NWI) data. These data were amended to exclude polygons overlying irrigated areas, and to expand riparian area in specific locations. A manually created mask was primarily used to fill in areas around the Snake River that the model did not uniformly designate as irrigated. Ground-truthing and a thorough review of IDWR’s water rights database provided further insight for class assignments near the town of Mayfield. Lastly, the Majority Filter tool in ArcGIS was applied using a kernel of 8 nearest neighbors to smooth out “speckling” within irrigated fields. The masking datasets and the final iteration of training points are available on request. Information regarding Sentinel and Landsat imagery:All satellite data products used within the Random Forest model were accessed via the Google Earth Engine API. To find more information on Sentinel data used, query the Earth Engine Data Catalog https://developers.google.com/earth-engine/datasets) using “COPERNICUS/S2_SR_HARMONIZED.” Information on Landsat datasets used can be found by querying “LANDSAT/LC08/C02/T1_L2” (for Landsat 8) and “LANDSAT/LC09/C02/T1_L2” (for Landsat 9).Each satellite product has several bands of available data. For our purposes, shortwave infrared 2 (SWIR2), blue, Normalized Difference Vegetation Index (NDVI), and near infrared (NIR) were extracted from both Sentinel and Landsat images. These images were later interpolated to the following dates: 2023-04-15, 2023-05-15, 2023-06-14, 2023-07-14, 2023-08-13, 2023-09-12. Interpolated values were taken from up to 45 days before and after each interpolated date. April-June interpolated Landsat images, as well as the April interpolated Sentinel image, were not used in the model given the extent of cloud cover overlying irrigated area. For more information on the pre-processing of satellite data used in the Random Forest model, please reach out to IDWR at gisinfo@idwr.idaho.gov.

  8. d

    InQuartik-ASSIGNEE QUERY: Patent Owner Data Worldwide with 50+ Years of...

    • datarade.ai
    .json, .csv, .xls
    Updated Dec 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    InQuartik (2021). InQuartik-ASSIGNEE QUERY: Patent Owner Data Worldwide with 50+ Years of History [Dataset]. https://datarade.ai/data-products/inquartik-assignee-query-patent-owner-data-worldwide-with-inquartik
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Dec 1, 2021
    Dataset authored and provided by
    InQuartik
    Area covered
    Philippines, Hong Kong, Dominican Republic, Switzerland, Austria, Honduras, Bahamas, France, Puerto Rico, Italy
    Description

    Are you looking for data that tell if the companies or persons you look into own any patents? If they do, do you want to know how many patents they own?

    The Assignee Query Data will provide you with a timely and comprehensive result of global patent ownership of the companies or individuals with the history of 50 years.

    How do we do that?

    We include decades’ worth of global full-text databases, such as the US, China, EM/EUIPO, Japan, Korea, WIPO and so on, and keep them updated on a timely basis—as frequently as every day or week, depending on the sources.

    Furthermore, the data downloaded are cleansed to minimize data errors and thus search and analysis errors. For example, we standardize assignee names to enables individual patents to correspond to a single owner; logic-based corrections ensure that values are corrected based on rules.

    In addition, we use advanced algorithms in analyzing, selecting, and presenting the most current and accurate information from multiple available data sources. For instance, a single patent’s legal status is triangulated across different patent data for accuracy. Moreover, proprietary Quality and Value rankings put patents in each key market under the equally evaluative process, offering subjective predictions for the patent's likelihood of validity and monetization.

  9. SHREC'14 Track: Large Scale Comprehensive 3D Shape Retrieval

    • catalog.data.gov
    • data.nist.gov
    • +2more
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). SHREC'14 Track: Large Scale Comprehensive 3D Shape Retrieval [Dataset]. https://catalog.data.gov/dataset/shrec14-track-large-scale-comprehensive-3d-shape-retrieval-51330
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    Objective: The objective of this track is to evaluate the performance of 3D shape retrieval approaches on a large-sale comprehensive 3D shape database which contains different types of models, such as generic, articulated, CAD and architecture models. Introduction: With the increasing number of 3D models created every day and stored in databases, the development of effective and scalable 3D search algorithms has become an important research area. In this contest, the task will be retrieving 3D models similar to a complete 3D model query from a new integrated large-scale comprehensive 3D shape benchmark including various types of models. Owing to the integration of the most important existing benchmarks to date, the newly created benchmark is the most exhaustive to date in terms of the number of semantic query categories covered, as well as the variations of model types. The shape retrieval contest will allow researchers to evaluate results of different 3D shape retrieval approaches when applied on a large scale comprehensive 3D database. The benchmark is motivated by a latest large collection of human sketches built by Eitz et al. [1]. To explore how human draw sketches and human sketch recognition, they collected 20,000 human-drawn sketches, categorized into 250 classes, each with 80 sketches. This sketch dataset is exhaustive in terms of the number of object categories. Thus, we believe that a 3D model retrieval benchmark based on their object categorizations will be more comprehensive and appropriate than currently available 3D retrieval benchmarks to more objectively and accurately evaluate the real practical performance of a comprehensive 3D model retrieval algorithm if implemented and used in the real world. Considering this, we build a SHREC'14 Large Scale Comprehensive Track Benchmark (SHREC14LSGTB) by collecting relevant models in the major previously proposed 3D object retrieval benchmarks. Our target is to find models for as many as classes of the 250 classes and find as many as models for each class. These previous benchmarks have been compiled with different goals in mind and to date, not been considered in their sum. Our work is the first to integrate them to form a new, larger benchmark corpus for comprehensive 3D shape retrieval. Dataset: SHREC'14 Large Scale Comprehensive Retrieval Track Benchmark has 8,987 models, categorized into 171 classes. We adopt a voting scheme to classify models. For each classification, we have at least two votes. If these two votes agree each other, we confirm that the classification is correct, otherwise, we perform a third vote to finalize the classification. All the models are categorized according to the classifications in Eitz et al. [1], based on visual similarity. Evaluation Method: To have a comprehensive evaluation of the retrieval algorithm, we employ seven commonly adopted performance metrics in 3D model retrieval technique. Please cite the papers: [1] Bo Li, Yijuan Lu, Chunyuan Li, Afzal Godil, Tobias Schreck, Masaki Aono, Martin Burtscher, Qiang Chen, Nihad Karim Chowdhury, Bin Fang, Hongbo Fu, Takahiko Furuya, Haisheng Li, Jianzhuang Liu, Henry Johan, Ryuichi Kosaka, Hitoshi Koyanagi, Ryutarou Ohbuchi, Atsushi Tatsuma, Yajuan Wan, Chaoli Zhang, Changqing Zou. A Comparison of 3D Shape Retrieval Methods Based on a Large-scale Benchmark Supporting Multimodal Queries. Computer Vision and Image Understanding, November 4, 2014. [2] Bo Li, Yijuan Lu, Chunyuan Li, Afzal Godil, Tobias Schreck, Masaki Aono, Qiang Chen, Nihad Karim Chowdhury, Bin Fang, Takahiko Furuya, Henry Johan, Ryuichi Kosaka, Hitoshi Koyanagi, Ryutarou Ohbuchi, Atsushi Tatsuma. SHREC' 14 Track: Large Scale Comprehensive 3D Shape Retrieval. Eurographics Workshop on 3D Object Retrieval 2014 (3DOR 2014): 131-140, 2014.

  10. Human Instructions - Multilingual (wikiHow)

    • kaggle.com
    Updated Mar 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paolo Pareti (2017). Human Instructions - Multilingual (wikiHow) [Dataset]. https://www.kaggle.com/paolop/human-instructions-multilingual-wikihow/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 17, 2017
    Dataset provided by
    Kaggle
    Authors
    Paolo Pareti
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Human Instructions Dataset - Multilingual

    Updated JSON files for English at this other Kaggle repository

    Available in 16 Different Languages Extracted from wikiHow

    Overview

    Step-by-step instructions have been extracted from wikiHow in 16 different languages and decomposed into a formal graph representation like the one showed in the picture below. The source pages where the instructions have been extracted from have also been collected and they can be shared upon request.

    Instructions are represented in RDF following the PROHOW vocabulary and data model. For example, the category, steps, requirements and methods of each set of instructions have been extracted.

    This dataset has been produced as part of the The Web of Know-How project.

    • To cite this dataset use: Paula Chocron, Paolo Pareti. Vocabulary Alignment for Collaborative Agents: a Study with Real-World Multilingual How-to Instructions. (PDF) (bibtex)

    Quick-Start: Instruction Extractor and Simplifier Script

    The large amount of data can make it difficult to work with this dataset. This is why an instruction-extraction python script was developed. This script allows you to:

    • select only the subset of the dataset you are interested in. For example only instructions from specific wikiHow pages, or instructions that fall within specific categories, such as cooking recipes, or those that have at least 5 steps, etc. The file class_hierarchy.ttl attached to this dataset is used to determine whether a set of instructions falls under a certain category or not.
    • simplify the data model of the instructions. The current data model is rich of semantic relations. However, this richness might make it complex to use. This script allows you to simplify the data model to make it easier to work with the data. An example graphical representation of this model is available here.

    The script is available on this GitHub repository.

    The Available Languages

    This page contains the link to the different language versions of the data.

    A previous version of this type of data, although for English only, is also available on Kaggle:

    For the multilingual dataset, this is the list of the available languages and number of articles in each:

    Querying the Dataset

    The dataset is in RDF and it can be queried in SPARQL. Sample SPARQL queries are available in this GitHub page.

    For example, [this SPARQL query](http://dydra.com/paolo-pareti/wikihow_multilingual/query?query=PREFIX%20prohow%3A%20%3Chttp%3A%2F%2Fw3id.org%2Fprohow%23%3E%20%0APREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%20%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%20%0APREFIX%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%20%0APREFIX%20oa%3A%20%3Chttp%3A%2F%2Fw...

  11. The files on your computer

    • kaggle.com
    Updated Jan 15, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cogs (2017). The files on your computer [Dataset]. https://www.kaggle.com/cogitoe/crab/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    cogs
    Description

    Dataset: The files on your computer.

    Crab is a command line tool for Mac and Windows that scans file data into a SQLite database, so you can run SQL queries over it.

    e.g. (Win)    C:> crab C:somepathMyProject
    or (Mac)    $ crab /some/path/MyProject
    

    You get a CRAB> prompt where you can enter SQL queries on the data, e.g. Count files by extension

    SELECT extension, count(*) 
    FROM files 
    GROUP BY extension;
    

    e.g. List the 5 biggest directories

    SELECT parentpath, sum(bytes)/1e9 as GB 
    FROM files 
    GROUP BY parentpath 
    ORDER BY sum(bytes) DESC LIMIT 5;
    

    Crab provides a virtual table, fileslines, which exposes file contents to SQL

    e.g. Count TODO and FIXME entries in any .c files, recursively

    SELECT fullpath, count(*) FROM fileslines 
    WHERE parentpath like '/Users/GN/HL3/%' and extension = '.c'
      and (data like '%TODO%' or data like '%FIXME%')
    GROUP BY fullpath;
    

    As well there are functions to run programs or shell commands on any subset of files, or lines within files e.g. (Mac) unzip all the .zip files, recursively

    SELECT exec('unzip', '-n', fullpath, '-d', '/Users/johnsmith/Target Dir/') 
    FROM files 
    WHERE parentpath like '/Users/johnsmith/Source Dir/%' and extension = '.zip';
    

    (Here -n tells unzip not to overwrite anything, and -d specifies target directory)

    There is also a function to write query output to file, e.g. (Win) Sort the lines of all the .txt files in a directory and write them to a new file

    SELECT writeln('C:UsersSJohnsondictionary2.txt', data) 
    FROM fileslines 
    WHERE parentpath = 'C:UsersSJohnson' and extension = '.txt'
    ORDER BY data;
    

    In place of the interactive prompt you can run queries in batch mode. E.g. Here is a one-liner that returns the full path all the files in the current directory

    C:> crab -batch -maxdepth 1 . "SELECT fullpath FROM files"
    

    Crab SQL can also be used in Windows batch files, or Bash scripts, e.g. for ETL processing.

    Crab is free for personal use, $5/mo commercial

    See more details here (mac): [http://etia.co.uk/][1] or here (win): [http://etia.co.uk/win/about/][2]

    An example SQLite database (Mac data) has been uploaded for you to play with. It includes an example files table for the directory tree you get when downloading the Project Gutenberg corpus, which contains 95k directories and 123k files.

    To scan your own files, and get access to the virtual tables and support functions you have to use the Crab SQLite shell, available for download from this page (Mac): [http://etia.co.uk/download/][3] or this page (Win): [http://etia.co.uk/win/download/][4]

    Content

    FILES TABLE

    The FILES table contains details of every item scanned, file or directory. All columns are indexed except 'mode'

    COLUMNS
     fileid (int) primary key -- files table row number, a unique id for each item
     name (text)        -- item name e.g. 'Hei.ttf'
     bytes (int)        -- item size in bytes e.g. 7502752
     depth (int)        -- how far scan recursed to find the item, starts at 0
     accessed (text)      -- datetime item was accessed
     modified (text)      -- datetime item was modified
     basename (text)      -- item name without path or extension, e.g. 'Hei'
     extension (text)     -- item extension including the dot, e.g. '.ttf'
     type (text)        -- item type, 'f' for file or 'd' for directory
     mode (text)        -- further type info and permissions, e.g. 'drwxr-xr-x'
     parentpath (text)     -- absolute path of directory containing the item, e.g. '/Library/Fonts/'
     fullpath (text) unique  -- parentpath of the item concatenated with its name, e.g. '/Library/Fonts/Hei.ttf'
    
    PATHS
    1) parentpath and fullpath don't support abbreviations such as ~ . or .. They're just strings.
    2) Directory paths all have a '/' on the end.
    

    FILESLINES TABLE

    The FILESLINES table is for querying data content of files. It has line number and data columns, with one row for each line of data in each file scanned by Crab.

    This table isn't available in the example dataset, because it's a virtual table and doesn't physically contain data.

    COLUMNS
     linenumber (int) -- line number within file, restarts count from 1 at the first line of each file
     data (text)    -- data content of the files, one entry for each line
    

    FILESLINES also duplicates the columns of the FILES table: fileid, name, bytes, depth, accessed, modified, basename, extension, type, mode, parentpath, and fullpath. This way you can restrict which files are searched without having to join tables.

    Example Gutenberg data

    An example SQLite database (Mac data), database.sqlite, has been uploaded for you to play with. It includes an example files table...

  12. Understanding the Influence of Parameter Value Uncertainty on Climate Model...

    • data.niaid.nih.gov
    • dataone.org
    • +1more
    zip
    Updated May 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sofia Ingersoll; Heather Childers; Sujan Bhattarai (2024). Understanding the Influence of Parameter Value Uncertainty on Climate Model Output: Developing an Interactive Web Dashboard [Dataset]. http://doi.org/10.5061/dryad.vq83bk422
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2024
    Dataset provided by
    University of California, Santa Barbara
    Authors
    Sofia Ingersoll; Heather Childers; Sujan Bhattarai
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Scientists at the National Center for Atmospheric Research have recently carried out several experiments to better understand the uncertainties associated with future climate projections. In particular, the NCAR Climate and Global Dynamics Lab (CGDL) working group has completed a large Parameter Perturbation Experiment (PPE) utilizing the Community Land Model (CLM), testing the effects of 32 parameters over thousands of simulations over a range of 250 years. The CLM model experiment is focused on understanding uncertainty around biogeophysical parameters that influence the balance of chemical cycling and sequestration variables. The current website for displaying model results is not intuitive or informative to the broader scientific audience or the general public. The goal of this project is to develop an improved data visualization dashboard for communicating the results of the CLM PPE. The interactive dashboard would provide an interface where new or experienced users can query the experiment database to ask which environmental processes are affected by a given model parameter, or vice versa. Improving the accessibility of the data will allow professionals to use the most recent land parameter data when evaluating the impact of a policy or action on climate change. Methods Data Source:

    University of California, Santa Barbara – Climate and Global Dynamics Lab, National Center for Atmospheric Research: Parameter Perturbation Experiment (CGD NCAR PPE-5). https://webext.cgd.ucar.edu/I2000/PPEn11_OAAT/ (Only public version of the data currently accessible. Data leveraged in this project is currently stored on the NCAR server and is not publicly available), https://www.cgd.ucar.edu/events/seminar/2023/katie-dagon-and-daniel-kennedy-132940 (Learn more about this complex data via this amazing presentation by Katie Dragon & Daniel Kennedy ^) The Parameter Perturbation Experiment data leveraged by our project was generated utilizing the Community Land Model v5 (CLM5) predictions. https://www.earthsystemgrid.org/dataset/ucar.cgd.ccsm4.CLM_LAND_ONLY.html

    Data Processing: We were working inside of NCAR’s CASPER cluster HPC server, this enabled us direct access to the raw data files. We created a script to read in 500 LHC PPE simulations as a data set with inputs for a climate variable and time range. When reading in the cluster of simulations, there is a preprocess function that performs dimensional reduction to simplify the data set for wrangling later. Once the data sets of interest were loaded, they were then ready for some dimensional corrections – some quirks that come with using CESM data. Our friend’s at NCAR CGDL actually provided us with the correct time-paring bug. The other functions to weigh each grid cell by land area, properly weigh each month according to their contribution to the number of days in a year, and to calculate the global average of each simulation were generated by our team to wrangle the data so it is suitable for emulation. These files were saved so they could be leveraged later using a built-in if-else statement within the read_n_wrangle() function. The preprocessed data is then used in the GPR ML Emulator to make 100 predictions for a climate variable of interest and 32 individual parameters. To summarize briefly without getting too into the nitty gritty, our GPR emulator does 3 things:

    Simplifies the LHC data so it can look at 1 parameter at a time and assess its relationship with a climate variable. Applies Fourier Amplitude Sensitivity Analysis to identify relationships between parameters and climate variables. It helps us see what the key influencers are. In the full chaotic LHC, it can assess the covariance of the parameter-parameter predictions simultaneously (this is the R^2 value you’ll see on your accuracy inset plot later)

    Additionally, it ‘pickles’ and saves the predictions and trained gpr_model so they can be utilized for further analysis, exploration, and visualizations. Attributes and structures defined in this notebook outlines the workflow utilized to generate the data in this repo. It pulls functions from this utils.py to execute the desired commands. Below we will look at the utils.py functions that are not explicitly defined in the notebook. – General side note: if you decide to explore that Attributes and structures defined in this notebook explaining how the data was made, you’ll notice you’ll be transported to another repo in this Organization: GaiaFuture. That’s our prototype playground! It’s a little messy because that’s where we spent the second half of this project tinkering. The official repository is https://github.com/GaiaFuture/CLM5_PPE_Emulator.

  13. MetaMath QA

    • kaggle.com
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). MetaMath QA [Dataset]. https://www.kaggle.com/datasets/thedevastator/metamathqa-performance-with-mistral-7b/suggestions?status=pending
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    MetaMath QA

    Mathematical Questions for Large Language Models

    By Huggingface Hub [source]

    About this dataset

    This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Data Dictionary

    The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)

    Preparing data for analysis

    It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.

    ##### Training Models using Mistral 7B

    Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .

    ##### Testing phosphors :

    After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low

    Research Ideas

    • Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.
    • Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.
    • Optimizing search algorithms that surface relevant answer results based on types of queries

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.

  14. d

    Data from: DEEPEN 3D PFA Favorability Models and 2D Favorability Maps at...

    • catalog.data.gov
    • data.openei.org
    • +1more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Renewable Energy Laboratory (2025). DEEPEN 3D PFA Favorability Models and 2D Favorability Maps at Newberry Volcano [Dataset]. https://catalog.data.gov/dataset/deepen-3d-pfa-favorability-models-and-2d-favorability-maps-at-newberry-volcano-7185c
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    National Renewable Energy Laboratory
    Area covered
    Newberry Volcano
    Description

    DEEPEN stands for DE-risking Exploration of geothermal Plays in magmatic ENvironments. Part of the DEEPEN project involved developing and testing a methodology for a 3D play fairway analysis (PFA) for multiple play types (conventional hydrothermal, superhot EGS, and supercritical). This was tested using new and existing geoscientific exploration datasets at Newberry Volcano. This GDR submission includes images, data, and models related to the 3D favorability and uncertainty models and the 2D favorability and uncertainty maps. The DEEPEN PFA Methodology is based on the method proposed by Poux et al. (2020), which uses the Leapfrog Geothermal software with the Edge extension to conduct PFA in 3D. This method uses all available data to build a 3D geodata model which can be broken down into smaller blocks and analyzed with advanced geostatistical methods. Each data set is imported into a 3D model in Leapfrog and divided into smaller blocks. Conditional queries can then be used to assign each block an index value which conditionally ranks each block's favorability, from 0-5 with 5 being most favorable, for each model (e.g., lithologic, seismic, magnetic, structural). The values between 0-5 assigned to each block are referred to as index values. The final step of the process is to combine all the index models to create a favorability index. This involves multiplying each index model by a given weight and then summing the resulting values. The DEEPEN PFA Methodology follows this approach, but split up by the specific geologic components of each play type. These components are defined as follows for each magmatic play type: 1. Conventional hydrothermal plays in magmatic environments: Heat, fluid, and permeability 2. Superhot EGS plays: Heat, thermal insulation, and producibility (the ability to create and sustain fractures suitable for and EGS reservoir) 3. Supercritical plays: Heat, supercritical fluid, pressure seal, and producibility (the proper permeability and pressure conditions to allow production of supercritical fluid) More information on these components and their development can be found in Kolker et al., 2022. For the purposes of subsurface imaging, it is easier to detect a permeable fluid-filled reservoir than it is to detect separate fluid and permeability components. Therefore, in this analysis, we combine fluid and permeability for conventional hydrothermal plays, and supercritical fluid and producibility for supercritical plays. More information on this process is described in the following sections. We also project the 3D favorability volumes onto 2D surfaces for simplified joint interpretation, and we incorporate an uncertainty component. Uncertainty was modeled using the best approach for the dataset in question, for the datasets where we had enough information to do so. Identifying which subsurface parameters are the least resolved can help qualify current PFA results and focus future efforts in data collection. Where possible, the resulting uncertainty models/indices were weighted using the same weights applied to the respective datasets, and summed, following the PFA methodology above, but for uncertainty. There are two different versions of the Leapfrog model and associated favorability models: - v1.0: The first release in June 2023 - v2.1: The second release, with improvements made to the earthquake catalog (included additional identified events, removed duplicate events), to the temperature model (fixed a deep BHT), and to the index models (updated the seismicity-heat source index models for supercritical and EGS, and the resistivity-insulation index models for all three play types). Also uses the jet color map rather than the magma color map for improved interpretability. - v2.1.1: Updated to include v2.0 uncertainty results (see below for uncertainty model versions) There are two different versions of the associated uncertainty models: - v1.0: The first release in June 2023 - v2.0: The second release, with improvements made to the temperature and fault uncertainty models. ** Note that this submission is deprecated and that a newer submission, linked below and titled "DEEPEN Final 3D PFA Favorability Models and 2D Favorability Maps at Newberry Volcano" contains the final versions of these resources. **

  15. d

    Asset database for the Hunter subregion on 24 February 2016

    • data.gov.au
    • cloud.csiss.gmu.edu
    • +2more
    Updated Aug 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2023). Asset database for the Hunter subregion on 24 February 2016 [Dataset]. https://data.gov.au/data/dataset/activity/a39290ac-3925-4abc-9ecb-b91e911f008f
    Explore at:
    Dataset updated
    Aug 9, 2023
    Dataset authored and provided by
    Bioregional Assessment Program
    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.

    Asset database for the Hunter subregion on 24 February 2016 (V2.5) supersedes the previous version of the HUN Asset database V2.4 (Asset database for the Hunter subregion on 20 November 2015, GUID: 0bbcd7f6-2d09-418c-9549-8cbd9520ce18). It contains the Asset database (HUN_asset_database_20160224.mdb), a Geodatabase version for GIS mapping purposes (HUN_asset_database_20160224_GISOnly.gdb), the draft Water Dependent Asset Register spreadsheet (BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20160224.xlsx), a data dictionary (HUN_asset_database_doc_20160224.doc), and a folder (NRM_DOC) containing documentation associated with the Water Asset Information Tool (WAIT) process as outlined below. This version should be used for Materiality Test (M2) test.

    The Asset database is registered to the BA repository as an ESRI personal goedatabase (.mdb - doubling as a MS Access database) that can store, query, and manage non-spatial data while the spatial data is in a separate file geodatabase joined by AID/ElementID.

    Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. A report on the WAIT process for the Hunter is included in the zip file as part of this dataset.

    Elements are initially included in the preliminary assets database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet the second Materiality Test (M2). Assets meeting both Materiality Tests comprise the water dependent asset list. Descriptions of the assets identified in the Hunter subregion are found in the "AssetList" table of the database.

    Assets are the spatial features used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy, as described in Appendix A of "HUN_asset_database_doc_20160224.doc ", located in this filet.

    The "Element_to_Asset" table contains the relationships and identifies the elements that were grouped to create each asset.

    Detailed information describing the database structure and content can be found in the document "HUN_asset_database_doc_20160224.doc" located in this file.

    Some of the source data used in the compilation of this dataset is restricted.

    The public version of this asset database can be accessed via the following dataset: Asset database for the Hunter subregion on 24 February 2016 Public 20170112 v02 (https://data.gov.au/data/dataset/9d16592c-543b-42d9-a1f4-0f6d70b9ffe7)

    Dataset History

    OBJECTID VersionID Notes Date_

    1 1 Initial database. 29/08/2014

    3 1.1 Update the classification for seven identical assets from Gloucester subregion 16/09/2014

    4 1.2 Added in NSW GDEs from Hunter - Central Rivers GDE mapping from NSW DPI (50 635 polygons). 28/01/2015

    5 1.3 New AIDs assiged to NSW GDE assets (Existing AID + 20000) to avoid duplication of AIDs assigned in other databases. 12/02/2015

    6 1.4 "(1) Add 20 additional datasets required by HUN assessment project team after HUN community workshop

           (2) Turn off previous GW point assets (AIDs from 7717-7810 inclusive) 
    
           (3) Turn off new GW point asset (AID: 0)
    
           (4) Assets (AIDs: 8023-8026) are duplicated to 4 assets (AID: 4747,4745,4744,4743 respectively) in NAM subregion . Their AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using   
    
             values from that NAM assets.
    
          (5) Asset (AID 8595) is duplicated to 1 asset ( AID 57) in GLO subregion . Its AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using values from that GLO assets.
    
          (6) 39 assets (AID from 2969 to 5040) are from NAM Asset database and their attributes were updated to use the latest attributes from NAM asset database 
    
         (7)The databases, especially spatial  database, were changed such as duplicated attributes fields in spatial data were removed and only ID field is kept. The user needs to join the Table Assetlist or Elementlist to 
    
            the spatial data"  16/06/2015
    

    7 2 "(1) Updated 131 new GW point assets with previous AID and some of them may include different element number due to the change of 77 FTypes requested by Hunter assessment project team

          (2) Added 104 EPBC assets, which were assessed and excluded by ERIN
    
          (3) Merged 30 Darling Hardyhead assets to one (asset AID 60140) and deleted another 29 
    
          (4) Turned off 5 assets from community workshop (60358 - 60362) as they are duplicated to 5 assets from 104 EPBC excluded assets
    
         (5) Updated M2 test results
    
         (6) Asset Names (AID: 4743 and 4747) were changed as requested by Hunter assessment project team (4 lower cases to 4 upper case only). Those two assets are from Namoi asset database and their asset names 
    
           may not match with original names in Namoi asset database.
    
         (7)One NSW WSP asset (AID: 60814) was added in as requested by Hunter assessment project team. The process method (without considering 1:M relation) for this asset is not robust and is different to other NSW 
    
          WSP assets. It should NOT use for other subregions. 
    
         (8) Queries of Find_All_Used_Assets and Find_All_WD_Assets in the asset database can be used to extract all used assts and all water dependant assts" 20/07/2015
    

    8 2.1 "(1) There are following six assets (in Hun subregion), which is same as 6 assets in GIP subregion. Their AID, Asset Name, Group, SubGroup, Depth, Source and ListDate are using values from GIP assets. You will

             not see AIDs from AID_from_HUN in whole HUN asset datable and spreadsheet anymore and you only can see AIDs from AID_from_GIP ( Actually (a) AID 11636 is GIP got from MBC (B) only AID, Asset Name 
    
             and ListDate are different and changed)
    
          (2) For BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx, (a) Extracted long ( >255 characters) WD rationale for 19 assets (AIDs:  
    
             8682,9065,9073,9087,9088,9100,9102,9103,60000,60001,60792,60793,60801,60713,60739,60751,60764,60774,60812 ) in tab "Water-dependent asset register" and 37 assets (AIDs: 
    
             5040,8651,8677,8682,8650,8686,8687,8718,8762,9094,9065,9067,9073,9077,9081,9086,9087,9088,9100,9102,9103,60000,60001,60739,60742,60751,60713,60764,60771,
    
             60774,60792,60793,60798,60801,60809,60811,60812) in tab "Asset list" in 1.30 Excel file (b) recreated draft BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx 
    
          (3) Modified queries (Find_All_Asset_List and Find_Waterdependent_asset_register) for (2)(a)"  27/08/2015
    

    9 2.2 "(1) Updated M2 results from the internal review for 386 Sociocultural assets

          (2)Updated the class to Ecological/Vegetation/Habitat (potential species distribution) for assets/elements from sources of WAIT_ALA_ERIN, NSW_TSEC, NSW_DPI_Fisheries_DarlingHardyhead"  8/09/2015
    

    10 2.3 "(1) Updated M2 results from the internal review

               \* Changed "Assessment team do not say No" to "All economic assets are by definition water dependent"
    
              \* Changed "Assessment team say No" : to "These are water dependent, but excluded by the project team based on intersection with the PAE is negligible"
    
              \* Changed "Rivertyles" to "RiverStyles""  22/09/2015
    

    11 2.4 "(1) Updated M2 test results for 86 assets from the external review

          (2) Updated asset names for two assets (AID: 8642 and 8643) required from the external review
    
          (3) Created Draft Water Dependent Asset Register file using the template V5"  20/11/2015
    

    12 2.5 "Total number of registered water assets was increased by 1 (= +2-1) due to:

                  Two assets changed M2 test from "No" to "Yes" , but one asset assets changed M2 test from "Yes" to "No" 
    
                 from the review done by Ecologist group." 24/02/2016
    

    Dataset Citation

    Bioregional Assessment Programme (2015) Asset database for the Hunter subregion on 24 February 2016. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/a39290ac-3925-4abc-9ecb-b91e911f008f.

    Dataset Ancestors

    *

  16. f

    CIS Graph Database and Model

    • figshare.com
    pdf
    Updated Sep 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanislava Gardasevic (2023). CIS Graph Database and Model [Dataset]. http://doi.org/10.6084/m9.figshare.21663401.v4
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Sep 6, 2023
    Dataset provided by
    figshare
    Authors
    Stanislava Gardasevic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is based on the model developed with the Ph.D. students of the Communication and Information Sciences Ph.D. program at the University of Hawaii at Manoa, intended to help new students get relevant information. The model was first presented at the iConference 2023, in a paper "Community Design of a Knowledge Graph to Support Interdisciplinary Ph.D. Students " by Stanislava Gardasevic and Rich Gazan (available at: https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/9eebcea7-06fd-4db3-b420-347883e6379e/content)The database is created in Neo4J, and the .dump file can be imported to the cloud instance of this software. The dataset (.dump) contains publically available data collected from multiple web locations and indexes of the sample of publications from the people in this domain. Except for that, it contains my (first author's) personal graph demonstrating progress through a student's program in this degree, and activities they have done while in the program. This dataset was made possible with the huge help of my collaborator, Petar Popovic, who ingested the data in the database.The model and dataset were developed while involving the end users in the design and are based on the actual information needs of a population. It is intended to allow researchers to investigate multigraph visualization of the data modeled by the said model.The knowledge graph was evaluated with CIS student population, and the study results show that it is very helpful for decision-making, information discovery, and identification of people in one's surroundings who might be good collaborators or information points. We provide the .json file containing the Neo4J Bloom perspective with styling and queries used in these evaluation sessions.

  17. NOS CO-OPS Water Level Data, Preliminary, 1-Minute

    • catalog.data.gov
    Updated Jun 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA NOS COOPS (Point of Contact) (2023). NOS CO-OPS Water Level Data, Preliminary, 1-Minute [Dataset]. https://catalog.data.gov/dataset/nos-co-ops-water-level-data-preliminary-1-minute
    Explore at:
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    National Ocean Servicehttps://oceanservice.noaa.gov/
    Description

    This dataset has recent, preliminary (not quality-controlled), 1-minute, water level (tide) data from NOAA NOS Center for Operational Oceanographic Products and Services (CO-OPS). WARNING: These raw data have not been subjected to the National Ocean Service's quality control or quality assurance procedures and do not meet the criteria and standards of official National Ocean Service data. They are released for limited public use as preliminary data to be used only with appropriate caution. WARNING: * Queries for data MUST include stationID=, datum=, and time>=. * Queries for data USUALLY include time<=. * Queries MUST be for less than 30 days worth of data. The default time<= value corresponds to 'now'. * Different stations support different datums. Use ERDDAP's Subset web page to find out which datums a given station supports. * The data source isn't completely reliable. If your request returns no data when you think it should: * Make sure the station you specified supports the datum you specified. * Try revising the request (e.g., a different datum or a different time range). * The list of stations offering this data (or the list of datums) may be incorrect. * Sometimes a station or the entire data service is unavailable. Wait a while and try again.

  18. d

    Mobile Location Data | Saudi Arabia | +15M Unique Devices | +5M Daily Users...

    • datarade.ai
    .json, .csv, .xls
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quadrant (2025). Mobile Location Data | Saudi Arabia | +15M Unique Devices | +5M Daily Users | +5B Events / Month [Dataset]. https://datarade.ai/data-products/mobile-location-data-saudi-arabia-15m-unique-devices-quadrant-9de1
    Explore at:
    .json, .csv, .xlsAvailable download formats
    Dataset updated
    Mar 27, 2025
    Dataset authored and provided by
    Quadrant
    Area covered
    Saudi Arabia
    Description

    Quadrant provides Insightful, accurate, and reliable mobile location data.

    Our privacy-first mobile location data unveils hidden patterns and opportunities, provides actionable insights, and fuels data-driven decision-making at the world's biggest companies.

    These companies rely on our privacy-first Mobile Location and Points-of-Interest Data to unveil hidden patterns and opportunities, provide actionable insights, and fuel data-driven decision-making. They build better AI models, uncover business insights, and enable location-based services using our robust and reliable real-world data.

    We conduct stringent evaluations on data providers to ensure authenticity and quality. Our proprietary algorithms detect, and cleanse corrupted and duplicated data points – allowing you to leverage our datasets rapidly with minimal processing or cleaning. During the ingestion process, our proprietary Data Filtering Algorithms remove events based on a number of both qualitative factors, as well as latency and other integrity variables to provide more efficient data delivery. The deduplicating algorithm focuses on a combination of four important attributes: Device ID, Latitude, Longitude, and Timestamp. This algorithm scours our data and identifies rows that contain the same combination of these four attributes. Post-identification, it retains a single copy and eliminates duplicate values to ensure our customers only receive complete and unique datasets.

    We actively identify overlapping values at the provider level to determine the value each offers. Our data science team has developed a sophisticated overlap analysis model that helps us maintain a high-quality data feed by qualifying providers based on unique data values rather than volumes alone – measures that provide significant benefit to our end-use partners.

    Quadrant mobility data contains all standard attributes such as Device ID, Latitude, Longitude, Timestamp, Horizontal Accuracy, and IP Address, and non-standard attributes such as Geohash and H3. In addition, we have historical data available back through 2022.

    Through our in-house data science team, we offer sophisticated technical documentation, location data algorithms, and queries that help data buyers get a head start on their analyses. Our goal is to provide you with data that is “fit for purpose”.

  19. H

    NOAA National Water Model Reanalysis Data at RENCI

    • hydroshare.org
    • beta.hydroshare.org
    • +2more
    zip
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Johnson; David Blodgett (2023). NOAA National Water Model Reanalysis Data at RENCI [Dataset]. http://doi.org/10.4211/hs.a1e329ad20654e72b7b423f991bf9251
    Explore at:
    zip(3.5 KB)Available download formats
    Dataset updated
    Oct 5, 2023
    Dataset provided by
    HydroShare
    Authors
    Mike Johnson; David Blodgett
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1993 - Dec 31, 2018
    Area covered
    Description

    This data release provides the reanalysis streamflow data from versions 1.2, 2.0, and 2.1 of the National Water Model structured for timeseries extraction. The impact of this is that user can query time series for a given NHDPlusV2 COMID without downloading the hourly CONUS files and extracting the sample of relevant values.

    The data is hosted on the RENCI THREDDS Data Server and is accessible via OPeNDAP at the follwoing URLs:

    Version 1.2 (https://thredds.hydroshare.org/thredds/catalog/nwm/retrospective/catalog.html?dataset=NWM_Retrospective/nwm_retro_full.ncml) - Spans 1993-01-01 00:00:00 to 2017-12-31 23:00:00 - Contains 219,144 hourly time steps for - 2,729,077 NHD reaches

    Version 2.0 (https://thredds.hydroshare.org/thredds/catalog/nwm/retrospective/catalog.html?dataset=NWM_Retrospective/nwm_v2_retro_full.ncml) - Spans 1993-01-01 00:00:00 to 2018-12-31 00:00:00 - Contains 227,903 hourly time steps for - 2,729,076 NHD reaches

    Version 2.1 (https://cida.usgs.gov/thredds/catalog/demo/morethredds/nwm/nwm_v21_retro_full.ncml) - Spans 1979-02-02 18:00:00 to 2020-12-31 00:00:00 - Contains 227,903 hourly time steps for - 2,729,076 NHD reaches

    Raw Data (https://registry.opendata.aws/nwm-archive/) - 227,000+ hourly netCDF files (depending on version)

    DDS

    The data description structure (DDS) can be viewed at the NcML page for each respective resource (linked above). More broadly each resource includes:

    • A 1D time array - hours since 1970-01-01 00:00
    • A 1D latitude array - coordinate (Y) information
    • A 1D longitude array - coordinate (X) information WGS84
    • A 1D feature_id array - NHDPlus V2 COMID (NWM forecast ID)
    • A 2D streamflow array - Q (cms) [feature_id, time]

    R package

    The nwmTools R package provides easier interaction with the OPeNDAP resources. Package documentation can be found here and the GitHub repository here.

    Collaborators:

    Mike Johnson, David Blodgett

    Support:

    This effort is supported by the Consortium of Universities for the Advancement of Hydrologic Science, Inc. under the HydroInformatics Fellowship. See program here

    Publications

    J.M. Johnson, David L. Blodgett, Keith C. Clarke, Jon Pollack. (2020). "Restructuring and serving web-accessible streamflow data from the NOAA National Water Model historic simulations". Nature Scienfic Data. (In Review)

  20. d

    Data from: DEEPEN: Final 3D PFA Favorability Models and 2D Favorability Maps...

    • catalog.data.gov
    • gdr.openei.org
    • +2more
    Updated Jan 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Renewable Energy Laboratory (2025). DEEPEN: Final 3D PFA Favorability Models and 2D Favorability Maps at Newberry Volcano [Dataset]. https://catalog.data.gov/dataset/deepen-final-3d-pfa-favorability-models-and-2d-favorability-maps-at-newberry-volcano-2a96b
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    National Renewable Energy Laboratory
    Area covered
    Newberry Volcano
    Description

    Part of the DEEPEN (DE-risking Exploration of geothermal Plays in magmatic ENvironments) project involved developing and testing a methodology for a 3D play fairway analysis (PFA) for multiple play types (conventional hydrothermal, superhot EGS, and supercritical). This was tested using new and existing geoscientific exploration datasets at Newberry Volcano. This GDR submission includes images, data, and models related to the 3D favorability and uncertainty models and the 2D favorability and uncertainty maps. The DEEPEN PFA Methodology, detailed in the journal article below, is based on the method proposed by Poux & O'brien (2020), which uses the Leapfrog Geothermal software with the Edge extension to conduct PFA in 3D. This method uses all available data to build a 3D geodata model which can be broken down into smaller blocks and analyzed with advanced geostatistical methods. Each data set is imported into a 3D model in Leapfrog and divided into smaller blocks. Conditional queries can then be used to assign each block an index value which conditionally ranks each block's favorability, from 0-5 with 5 being most favorable, for each model (e.g., lithologic, seismic, magnetic, structural). The values between 0-5 assigned to each block are referred to as index values. The final step of the process is to combine all the index models to create a favorability index. This involves multiplying each index model by a given weight and then summing the resulting values. The DEEPEN PFA Methodology follows this approach, but split up by the specific geologic components of each play type. These components are defined as follows for each magmatic play type: 1. Conventional hydrothermal plays in magmatic environments: Heat, fluid, and permeability 2. Superhot EGS plays: Heat, thermal insulation, and producibility (the ability to create and sustain fractures suitable for and EGS reservoir) 3. Supercritical plays: Heat, supercritical fluid, pressure seal, and producibility (the proper permeability and pressure conditions to allow production of supercritical fluid) More information on these components and their development can be found in Kolker et al., (2022). For the purposes of subsurface imaging, it is easier to detect a permeable fluid-filled reservoir than it is to detect separate fluid and permeability components. Therefore, in this analysis, we combine fluid and permeability for conventional hydrothermal plays, and supercritical fluid and producibility for supercritical plays. We also project the 3D favorability volumes onto 2D surfaces for simplified joint interpretation, and we incorporate an uncertainty component. Uncertainty was modeled using the best approach for the dataset in question, for the datasets where we had enough information to do so. Identifying which subsurface parameters are the least resolved can help qualify current PFA results and focus future efforts in data collection. Where possible, the resulting uncertainty models/indices were weighted using the same weights applied to the respective datasets, and summed, following the PFA methodology above, but for uncertainty.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Maurice Vanderfeesten; Maurice Vanderfeesten; Linda Hasse; Linda Hasse (2020). Text Analyses of Survey Data on "Mapping Research Output to the Sustainable Development Goals (SDGs)" [Dataset]. http://doi.org/10.5281/zenodo.3832090
Organization logo

Text Analyses of Survey Data on "Mapping Research Output to the Sustainable Development Goals (SDGs)"

Related Article
Explore at:
zipAvailable download formats
Dataset updated
May 20, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maurice Vanderfeesten; Maurice Vanderfeesten; Linda Hasse; Linda Hasse
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This package contains data on five text analysis types (term extraction, contract analysis, topic modeling, network mapping), based on the survey data where researchers selected research output that are related to the 17 Sustainable Development Goals (SDGs). This is used as input to improve the current SDG classification model v4.0 to v5.0

Sustainable Development Goals are the 17 global challenges set by the United Nations. Within each of the goals specific targets and indicators are mentioned to monitor the progress of reaching those goals by 2030. In an effort to capture how research is contributing to move the needle on those challenges, we earlier have made an initial classification model than enables to quickly identify what research output is related to what SDG. (This Aurora SDG dashboard is the initial outcome as proof of practice.)

The initiative started from the Aurora Universities Network in 2017, in the working group "Societal Impact and Relevance of Research", to investigate and to make visible 1. what research is done that are relevant to topics or challenges that live in society (for the proof of practice this has been scoped down to the SDGs), and 2. what the effect or impact is of implementing those research outcomes to those societal challenges (this also have been scoped down to research output being cited in policy documents from national and local governments an NGO's).

Context of this dataset | classification model improvement workflow

The classification model we have used are 17 different search queries on the Scopus database.

Methods used to do the text analysis

  1. Term Extraction: after text normalisation (stemming, etc) we extracted 2 terms in bigrams and trigrams that co-occurred the most per document, in the title, abstract and keyword
  2. Contrast analysis: the co-occurring terms in publications (title, abstract, keywords), of the papers that respondents have indicated relate to this SDG (y-axis: True), and that have been rejected (x-axis: False). In the top left you'll see term co-occurrences that a clearly relate to this SDG. The bottom-right are terms that are appear in papers that have been rejected for this SDG. The top-right terms appear frequently in both and cannot be used to discriminate between the two groups.
  3. Network map: This diagram shows the cluster-network of terms co-occurring in the publications related to this SDG, selected by the respondents (accepted publications only).
  4. Topic model: This diagram shows the topics, and the related terms that make up that topic. The number of topics is related to the number of of targets of this SDG.
  5. Contingency matrix: This diagram shows the top 10 of co-occurring terms that correlate the most.

Software used to do the text analyses

CorTexT: The CorTexT Platform is the digital platform of LISIS Unit and a project launched and sustained by IFRIS and INRAE. This platform aims at empowering open research and studies in humanities about the dynamic of science, technology, innovation and knowledge production.

Resource with interactive visualisations

Based on the text analysis data we have created a website that puts all the SDG interactive diagrams together. For you to scrall through. https://sites.google.com/vu.nl/sdg-survey-analysis-results/

Data set content

In the dataset root you'll find the following folders and files:

  • /sdg01-17/
    • This contains the text analysis for all the individual SDG surveys.
  • /methods/
    • This contains the step-by-step explanations of the text analysis methods using Cortext.
  • /images/
    • images of the results used in this README.md.
  • LICENSE.md
    • terms and conditions for reusing this data.
  • README.md
    • description of the dataset; each subfolders contains a README.md file to futher describe the content of each sub-folder.

Inside an /sdg01-17/-folder you'll find the following:

  • This contains the step-by-step explanations of the text analysis methods using Cortext.
  • /sdg01-17/sdg04-sdg-survey-selected-publications-combined.db
    • his contains the title, abstract, keywords, fo the publications in the survey, including the and accept or rejection status and the number of respondents
  • /sdg01-17/sdg04-sdg-survey-selected-publications-combined-accepted-accepted-custom-filtered.db
    • same as above, but only the accepted papers
  • /sdg01-17/extracted-terms-list-top1000.csv
    • the aggregated list of co-occuring terms (bigrams and trigrams) extracted per paper.
  • /sdg01-17/contrast-analysis/
    • This contains the data and visualisation of the terms appearing in papers that have been accepted (true) and rejected (false) to be relating to this SDG.
  • /sdg01-17/topic-modelling/
    • This contains the data and visualisation of the terms clustered in the same number of topics as there are 'targets' within that SDG.
  • /sdg01-17/network-mapping/
    • This contains the data and visualisation of the terms clustered in co-occuring proximation of appearance in papers
  • /sdg01-17/contingency-matrix/
    • This contains the data and visualisation of the top 10 terms co-occuring

note: the .csv files are actually tab-separated.

Contribute and improve the SDG Search Queries

We welcome you to join the Github community and to fork, branch, improve and make a pull request to add your improvements to the new version of the SDG queries. https://github.com/Aurora-Network-Global/sdg-queries

Search
Clear search
Close search
Google apps
Main menu