Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This package contains data on five text analysis types (term extraction, contract analysis, topic modeling, network mapping), based on the survey data where researchers selected research output that are related to the 17 Sustainable Development Goals (SDGs). This is used as input to improve the current SDG classification model v4.0 to v5.0
Sustainable Development Goals are the 17 global challenges set by the United Nations. Within each of the goals specific targets and indicators are mentioned to monitor the progress of reaching those goals by 2030. In an effort to capture how research is contributing to move the needle on those challenges, we earlier have made an initial classification model than enables to quickly identify what research output is related to what SDG. (This Aurora SDG dashboard is the initial outcome as proof of practice.)
The initiative started from the Aurora Universities Network in 2017, in the working group "Societal Impact and Relevance of Research", to investigate and to make visible 1. what research is done that are relevant to topics or challenges that live in society (for the proof of practice this has been scoped down to the SDGs), and 2. what the effect or impact is of implementing those research outcomes to those societal challenges (this also have been scoped down to research output being cited in policy documents from national and local governments an NGO's).
Context of this dataset | classification model improvement workflow
The classification model we have used are 17 different search queries on the Scopus database.
Methods used to do the text analysis
Software used to do the text analyses
CorTexT: The CorTexT Platform is the digital platform of LISIS Unit and a project launched and sustained by IFRIS and INRAE. This platform aims at empowering open research and studies in humanities about the dynamic of science, technology, innovation and knowledge production.
Resource with interactive visualisations
Based on the text analysis data we have created a website that puts all the SDG interactive diagrams together. For you to scrall through. https://sites.google.com/vu.nl/sdg-survey-analysis-results/
Data set content
In the dataset root you'll find the following folders and files:
Inside an /sdg01-17/-folder you'll find the following:
note: the .csv files are actually tab-separated.
Contribute and improve the SDG Search Queries
We welcome you to join the Github community and to fork, branch, improve and make a pull request to add your improvements to the new version of the SDG queries. https://github.com/Aurora-Network-Global/sdg-queries
There are two objectives of this shape retrieval contest: a) To evaluate partial similarity between query and target objects and retrieve complete 3D models that are relevant to a partial query object. b) To retrieve 3D models those are relevant to a query depth map. This task corresponds to a real life scenario where the query is a 3D range scan of an object acquired from an arbitrary view direction. The algorithm should retrieve the relevant 3D objects from a database. Task description In response to a given set of queries, the task is to evaluate similarity scores with the target models and return an ordered ranked list along with the similarity scores for each query. The set of queries either consists of partial 3D models or of range images. The participants may present ranked lists for either of the query sets or both. There is no obligation to submit rank lists for both of the query sets. Dataset The first query set consists of 20 3D partial models which are obtained by cutting parts from complete models. The objective is to retrieve the models which the query part may belong to. The file format to represent the partial query models is the ASCII Object File Format (.off). This second query set is composed of 20 range images, which are acquired by capturing range data of 20 models from arbitrary view directions. The range images are captured using a desktop 3D scanner. The file format is in the ASCII Object File Format (.off) representing the scan in a triangular mesh. The target database is the same for both of the query sets and it contains 720 complete 3D models, which are categorized into 40 classes. In each class there are 18 models. The file format to represent the 3D models is the ASCII Object File Format (*.off).D Models, Classification files, Evaluation software, Images Paper: Dutagaci, Helin, Godil, Afzal, et al. "SHREC'09 track: querying with partial models." Proceedings of the 2nd Eurographics conference on 3D Object Retrieval. Eurographics Association, 2009. https://doi.org/10.5555/2381128.2381144
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The COKI Language Dataset contains predictions for 122 million academic publications. The dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.
Methodology
A subset of the COKI Academic Observatory Dataset, which is produced by the Academic Observatory Workflows codebase [1], was extracted and converted to CSV with Bigquery and downloaded to a virtual machine. The subset consists of all publications with DOIs in our dataset, including each publication’s title and abstract from both Crossref Metadata and Microsoft Academic Graph. The CSV files were then processed with a Python script. The titles and abstracts for each record were pre-processed, concatenated together and analysed with fastText. The titles and abstracts from Crossref Metadata were used first, with the MAG titles and abstracts serving as a fallback when the Crossref Metadata information was empty. Language was predicted for each publication using the fastText lid.176.bin language identification model [2]. fastText was chosen because of its high accuracy and fast runtime speed [3]. The final output dataset consists of DOI, title, ISO language code and the fastText language prediction probability score.
Query or Download
The data is publicly accessible in BigQuery in the following two tables:
When you make queries on these tables, make sure that you are in your own Google Cloud project, otherwise the queries will fail.
See the COKI Language Detection README for instructions on how to download the data from Zenodo and load it into BigQuery.
Code
The code that generated this dataset, the BigQuery schemas and instructions for loading the data into BigQuery can be found here: https://github.com/The-Academic-Observatory/coki-language
License
COKI Language Dataset © 2022 by Curtin University is licenced under CC BY 4.0.
Attributions
This work contains information from:
References
[1] https://doi.org/10.5281/zenodo.6366695
[2] https://fasttext.cc/docs/en/language-identification.html
[3] https://modelpredict.com/language-identification-survey
The MAST Archive at STScI TAP end point for observational data, saved in the Common Archive Data Model format and made available through the ObsCore limited view. The Table Access Protocol (TAP) lets you execute queries against our database tables, and inspect various metadata. Upload is not currently supported. Missions and projects with data available through the CAOMTAP service include: BEFS, EUVE, FUSE, GALEX, HLA, HST, HUT, IUE, JWST, K2, KEPLER, PS1 (PanSTARRS 1) Data Release 2, SPITZER_SHA, SWIFT, TESS, TUES, WUPPE.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
THIS VERSION IS OUTDATED, PLEASE CHECK OUT THE LAST VERSION HERE: https://zenodo.org/record/6771217
_
This repository contains result data for the paper "Open modeling of electricity and heat demand curves for all residential buildings in Germany".
The published data includes residential electricity and heat demand profiles for every building in Germany. It was created with the open source tool eGon-data within the research project eGon. All input data sets as well as the code are available under open source licenses.
Files
Database structure
After restoring the backup file, the data is stored in different schemas: society, openstreetmap and demand. Different tables have to be combined to create the final demand time series for heat and electricity. In the following, the tables and the matching methods are described.
The schema society includes data from Census 2011 on population in 100m x 100m cells ('Census cells'). The cells are georeferenced and have a unique id.
Schema: society
Schema: openstreetmap
The schema openstreetmap includes data on residential buildings. All buildings hold an internal building_id. All residential buildings extracted from openstreetmap are stored in openstreetmap.osm_buildings_residential including osm_id and internal building_id. Additional, synthetic buildings are stored in openstreetmap.osm_buildings_synthetic.
Schema: demand
With the profile_ids in egon_household_electricity_profile_of_buildings, specific profiles from iee_household_load_profiles are mapped to all residential buildings. The profiles need to be scaled therafter by their annual sum and the corresponding scaling factors, which can be found in egon_household_electricity_profile_in_census_cell and matched per census cell id.
Heat demand profiles per building can be created by combining the tables egon_peta_heat, heat_idp_pool and heat_timeseries_selected_profiles. In addition, weather data (e.g. from ERA5, located in additional_data/) is needed to distribute the annual heat demands to single days. This is included in the example script, the usage is described below.
Weather data and the used climate zones are not included in the database. They are stored in files which are part of the additional_data/ folder. In this folder, you find the following data sets:
Example queries
Electricity profiles: The demand profiles for residential buildings can be obtained using the tables stored in the demand schema. To extract electricity demand profiles, the following tables have to be combined:
Example script to obtain the electrical demand timeseries for 1 specific building for the eGon2035 scenario:
According to our latest research, the global database development and management tools software market size reached USD 15.8 billion in 2024, reflecting robust demand across diverse sectors. The market is anticipated to expand at a CAGR of 13.2% during the forecast period, propelling the market to an estimated USD 44.2 billion by 2033. This impressive growth is driven by the escalating need for efficient data management, the proliferation of cloud-based solutions, and the increasing complexity of enterprise data environments. As organizations worldwide continue to digitize their operations and harness big data analytics, the demand for advanced database development and management tools software is set to surge.
One of the primary growth factors for the database development and management tools software market is the exponential increase in data volumes generated by businesses, governments, and individuals alike. The digital transformation wave sweeping across industries necessitates robust solutions for storing, organizing, and retrieving vast datasets with high reliability and speed. Organizations are increasingly leveraging data-driven insights to enhance decision-making, optimize operations, and personalize customer experiences. This reliance on data has compelled enterprises to invest in sophisticated database development and management tools that can handle complex queries, streamline data modeling, and ensure data integrity. As a result, both established enterprises and emerging startups are prioritizing investments in this market, further fueling its expansion.
Another significant driver of market growth is the rapid adoption of cloud computing technologies. Cloud-based database management solutions offer unparalleled scalability, flexibility, and cost-effectiveness compared to traditional on-premises systems. With organizations seeking to minimize IT infrastructure costs and improve accessibility, cloud deployment models are gaining substantial traction. This shift is particularly pronounced among small and medium enterprises (SMEs), which benefit from the reduced upfront investment and operational agility provided by cloud solutions. Additionally, the integration of artificial intelligence and machine learning capabilities into database tools is enabling automated performance monitoring, predictive maintenance, and advanced security management, further enhancing the value proposition of these solutions.
The growing emphasis on data security and regulatory compliance is also shaping the trajectory of the database development and management tools software market. With the rising incidence of cyberattacks and stringent data protection regulations such as GDPR, HIPAA, and CCPA, organizations are under pressure to safeguard sensitive information and ensure compliance. Advanced database management tools now incorporate robust security features, including encryption, access controls, and real-time threat detection, to address these concerns. Vendors are continuously innovating to provide end-to-end security management and automated compliance reporting, making their solutions indispensable for businesses operating in highly regulated industries such as BFSI, healthcare, and government.
Regionally, North America continues to dominate the market, accounting for the largest revenue share in 2024, followed closely by Europe and the Asia Pacific. The presence of leading technology providers, early adoption of digital technologies, and a strong focus on innovation contribute to North America's leadership. Meanwhile, the Asia Pacific region is experiencing the fastest growth, driven by rapid industrialization, increasing IT investments, and the proliferation of cloud-based services in emerging economies such as China and India. Europe maintains a steady growth trajectory, supported by stringent data protection regulations and a mature enterprise IT landscape. Latin America and the Middle East & Africa are also witnessing increased adoption, albeit at a slower pace, as organizations in these regions gradually embrace digital transformation.
This raster file represents land within the Mountain Home study boundary classified as either “irrigated” with a cell value of 1 or “non-irrigated” with a cell value of 0 at a 10-meter spatial resolution. These classifications were determined at the pixel level by use of Random Forest, a supervised machine learning algorithm. Classification models often employ Random Forest due to its accuracy and efficiency at labeling large spatial datasets. To build a Random Forest model and supervise the learning process, IDWR staff create pre-labeled data, or training points, which are used by the algorithm to construct decision trees that will be later used on unseen data. Model accuracy is determined using a subset of the training points, otherwise known as a validation dataset. Several satellite-based input datasets are made available to the Random Forest model, which aid in distinguishing characteristics of irrigated lands. These characteristics allow patterns to be established by the model, e.g., high NDVI during summer months for cultivated crops, or consistently low ET for dryland areas. Mountain Home Irrigated Lands 2023 employed the following input datasets: US Geological Survey (USGS) products, including Landsat 8/9 and 10-meter 3DEP DEM, and European Space Agency (ESA) Copernicus products, including Harmonized Sentinel-2 and Global 30m Height Above Nearest Drainage (HAND). For the creation of manually labeled training points, IDWR staff accessed the following datasets: NDVI derived from Landsat 8/9, Sentinel-2 CIR imagery, US Department of Agriculture National Agricultural Statistics Service (USDA NASS) Cropland Data Layer, Active Water Rights Place of Use data from IDWR, and USDA’s National Agriculture Imagery Program (NAIP) imagery. All datasets were available for the current year of interest (2023). The published Mountain Home Irrigated Lands 2023 land classification raster was generated after four model runs, where at each iteration, IDWR staff added or removed training points to help improve results. Early model runs showed poor results in riparian areas near the Snake River, concentrated animal feeding operations (CAFOs), and non-irrigated areas at higher elevations. These issues were resolved after several model runs in combination with post-processing masks. Masks used include Fish and Wildlife Service’s National Wetlands Inventory (FWS NWI) data. These data were amended to exclude polygons overlying irrigated areas, and to expand riparian area in specific locations. A manually created mask was primarily used to fill in areas around the Snake River that the model did not uniformly designate as irrigated. Ground-truthing and a thorough review of IDWR’s water rights database provided further insight for class assignments near the town of Mayfield. Lastly, the Majority Filter tool in ArcGIS was applied using a kernel of 8 nearest neighbors to smooth out “speckling” within irrigated fields. The masking datasets and the final iteration of training points are available on request. Information regarding Sentinel and Landsat imagery:All satellite data products used within the Random Forest model were accessed via the Google Earth Engine API. To find more information on Sentinel data used, query the Earth Engine Data Catalog https://developers.google.com/earth-engine/datasets) using “COPERNICUS/S2_SR_HARMONIZED.” Information on Landsat datasets used can be found by querying “LANDSAT/LC08/C02/T1_L2” (for Landsat 8) and “LANDSAT/LC09/C02/T1_L2” (for Landsat 9).Each satellite product has several bands of available data. For our purposes, shortwave infrared 2 (SWIR2), blue, Normalized Difference Vegetation Index (NDVI), and near infrared (NIR) were extracted from both Sentinel and Landsat images. These images were later interpolated to the following dates: 2023-04-15, 2023-05-15, 2023-06-14, 2023-07-14, 2023-08-13, 2023-09-12. Interpolated values were taken from up to 45 days before and after each interpolated date. April-June interpolated Landsat images, as well as the April interpolated Sentinel image, were not used in the model given the extent of cloud cover overlying irrigated area. For more information on the pre-processing of satellite data used in the Random Forest model, please reach out to IDWR at gisinfo@idwr.idaho.gov.
Are you looking for data that tell if the companies or persons you look into own any patents? If they do, do you want to know how many patents they own?
The Assignee Query Data will provide you with a timely and comprehensive result of global patent ownership of the companies or individuals with the history of 50 years.
How do we do that?
We include decades’ worth of global full-text databases, such as the US, China, EM/EUIPO, Japan, Korea, WIPO and so on, and keep them updated on a timely basis—as frequently as every day or week, depending on the sources.
Furthermore, the data downloaded are cleansed to minimize data errors and thus search and analysis errors. For example, we standardize assignee names to enables individual patents to correspond to a single owner; logic-based corrections ensure that values are corrected based on rules.
In addition, we use advanced algorithms in analyzing, selecting, and presenting the most current and accurate information from multiple available data sources. For instance, a single patent’s legal status is triangulated across different patent data for accuracy. Moreover, proprietary Quality and Value rankings put patents in each key market under the equally evaluative process, offering subjective predictions for the patent's likelihood of validity and monetization.
Objective: The objective of this track is to evaluate the performance of 3D shape retrieval approaches on a large-sale comprehensive 3D shape database which contains different types of models, such as generic, articulated, CAD and architecture models. Introduction: With the increasing number of 3D models created every day and stored in databases, the development of effective and scalable 3D search algorithms has become an important research area. In this contest, the task will be retrieving 3D models similar to a complete 3D model query from a new integrated large-scale comprehensive 3D shape benchmark including various types of models. Owing to the integration of the most important existing benchmarks to date, the newly created benchmark is the most exhaustive to date in terms of the number of semantic query categories covered, as well as the variations of model types. The shape retrieval contest will allow researchers to evaluate results of different 3D shape retrieval approaches when applied on a large scale comprehensive 3D database. The benchmark is motivated by a latest large collection of human sketches built by Eitz et al. [1]. To explore how human draw sketches and human sketch recognition, they collected 20,000 human-drawn sketches, categorized into 250 classes, each with 80 sketches. This sketch dataset is exhaustive in terms of the number of object categories. Thus, we believe that a 3D model retrieval benchmark based on their object categorizations will be more comprehensive and appropriate than currently available 3D retrieval benchmarks to more objectively and accurately evaluate the real practical performance of a comprehensive 3D model retrieval algorithm if implemented and used in the real world. Considering this, we build a SHREC'14 Large Scale Comprehensive Track Benchmark (SHREC14LSGTB) by collecting relevant models in the major previously proposed 3D object retrieval benchmarks. Our target is to find models for as many as classes of the 250 classes and find as many as models for each class. These previous benchmarks have been compiled with different goals in mind and to date, not been considered in their sum. Our work is the first to integrate them to form a new, larger benchmark corpus for comprehensive 3D shape retrieval. Dataset: SHREC'14 Large Scale Comprehensive Retrieval Track Benchmark has 8,987 models, categorized into 171 classes. We adopt a voting scheme to classify models. For each classification, we have at least two votes. If these two votes agree each other, we confirm that the classification is correct, otherwise, we perform a third vote to finalize the classification. All the models are categorized according to the classifications in Eitz et al. [1], based on visual similarity. Evaluation Method: To have a comprehensive evaluation of the retrieval algorithm, we employ seven commonly adopted performance metrics in 3D model retrieval technique. Please cite the papers: [1] Bo Li, Yijuan Lu, Chunyuan Li, Afzal Godil, Tobias Schreck, Masaki Aono, Martin Burtscher, Qiang Chen, Nihad Karim Chowdhury, Bin Fang, Hongbo Fu, Takahiko Furuya, Haisheng Li, Jianzhuang Liu, Henry Johan, Ryuichi Kosaka, Hitoshi Koyanagi, Ryutarou Ohbuchi, Atsushi Tatsuma, Yajuan Wan, Chaoli Zhang, Changqing Zou. A Comparison of 3D Shape Retrieval Methods Based on a Large-scale Benchmark Supporting Multimodal Queries. Computer Vision and Image Understanding, November 4, 2014. [2] Bo Li, Yijuan Lu, Chunyuan Li, Afzal Godil, Tobias Schreck, Masaki Aono, Qiang Chen, Nihad Karim Chowdhury, Bin Fang, Takahiko Furuya, Henry Johan, Ryuichi Kosaka, Hitoshi Koyanagi, Ryutarou Ohbuchi, Atsushi Tatsuma. SHREC' 14 Track: Large Scale Comprehensive 3D Shape Retrieval. Eurographics Workshop on 3D Object Retrieval 2014 (3DOR 2014): 131-140, 2014.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Step-by-step instructions have been extracted from wikiHow in 16 different languages and decomposed into a formal graph representation like the one showed in the picture below. The source pages where the instructions have been extracted from have also been collected and they can be shared upon request.
Instructions are represented in RDF following the PROHOW vocabulary and data model. For example, the category, steps, requirements and methods of each set of instructions have been extracted.
This dataset has been produced as part of the The Web of Know-How project.
The large amount of data can make it difficult to work with this dataset. This is why an instruction-extraction python script was developed. This script allows you to:
class_hierarchy.ttl
attached to this dataset is used to determine whether a set of instructions falls under a certain category or not.The script is available on this GitHub repository.
This page contains the link to the different language versions of the data.
A previous version of this type of data, although for English only, is also available on Kaggle:
For the multilingual dataset, this is the list of the available languages and number of articles in each:
The dataset is in RDF and it can be queried in SPARQL. Sample SPARQL queries are available in this GitHub page.
Crab is a command line tool for Mac and Windows that scans file data into a SQLite database, so you can run SQL queries over it.
e.g. (Win) C:> crab C:somepathMyProject
or (Mac) $ crab /some/path/MyProject
You get a CRAB> prompt where you can enter SQL queries on the data, e.g. Count files by extension
SELECT extension, count(*)
FROM files
GROUP BY extension;
e.g. List the 5 biggest directories
SELECT parentpath, sum(bytes)/1e9 as GB
FROM files
GROUP BY parentpath
ORDER BY sum(bytes) DESC LIMIT 5;
Crab provides a virtual table, fileslines, which exposes file contents to SQL
e.g. Count TODO and FIXME entries in any .c files, recursively
SELECT fullpath, count(*) FROM fileslines
WHERE parentpath like '/Users/GN/HL3/%' and extension = '.c'
and (data like '%TODO%' or data like '%FIXME%')
GROUP BY fullpath;
As well there are functions to run programs or shell commands on any subset of files, or lines within files e.g. (Mac) unzip all the .zip files, recursively
SELECT exec('unzip', '-n', fullpath, '-d', '/Users/johnsmith/Target Dir/')
FROM files
WHERE parentpath like '/Users/johnsmith/Source Dir/%' and extension = '.zip';
(Here -n tells unzip not to overwrite anything, and -d specifies target directory)
There is also a function to write query output to file, e.g. (Win) Sort the lines of all the .txt files in a directory and write them to a new file
SELECT writeln('C:UsersSJohnsondictionary2.txt', data)
FROM fileslines
WHERE parentpath = 'C:UsersSJohnson' and extension = '.txt'
ORDER BY data;
In place of the interactive prompt you can run queries in batch mode. E.g. Here is a one-liner that returns the full path all the files in the current directory
C:> crab -batch -maxdepth 1 . "SELECT fullpath FROM files"
Crab SQL can also be used in Windows batch files, or Bash scripts, e.g. for ETL processing.
Crab is free for personal use, $5/mo commercial
See more details here (mac): [http://etia.co.uk/][1] or here (win): [http://etia.co.uk/win/about/][2]
An example SQLite database (Mac data) has been uploaded for you to play with. It includes an example files table for the directory tree you get when downloading the Project Gutenberg corpus, which contains 95k directories and 123k files.
To scan your own files, and get access to the virtual tables and support functions you have to use the Crab SQLite shell, available for download from this page (Mac): [http://etia.co.uk/download/][3] or this page (Win): [http://etia.co.uk/win/download/][4]
The FILES table contains details of every item scanned, file or directory. All columns are indexed except 'mode'
COLUMNS
fileid (int) primary key -- files table row number, a unique id for each item
name (text) -- item name e.g. 'Hei.ttf'
bytes (int) -- item size in bytes e.g. 7502752
depth (int) -- how far scan recursed to find the item, starts at 0
accessed (text) -- datetime item was accessed
modified (text) -- datetime item was modified
basename (text) -- item name without path or extension, e.g. 'Hei'
extension (text) -- item extension including the dot, e.g. '.ttf'
type (text) -- item type, 'f' for file or 'd' for directory
mode (text) -- further type info and permissions, e.g. 'drwxr-xr-x'
parentpath (text) -- absolute path of directory containing the item, e.g. '/Library/Fonts/'
fullpath (text) unique -- parentpath of the item concatenated with its name, e.g. '/Library/Fonts/Hei.ttf'
PATHS
1) parentpath and fullpath don't support abbreviations such as ~ . or .. They're just strings.
2) Directory paths all have a '/' on the end.
The FILESLINES table is for querying data content of files. It has line number and data columns, with one row for each line of data in each file scanned by Crab.
This table isn't available in the example dataset, because it's a virtual table and doesn't physically contain data.
COLUMNS
linenumber (int) -- line number within file, restarts count from 1 at the first line of each file
data (text) -- data content of the files, one entry for each line
FILESLINES also duplicates the columns of the FILES table: fileid, name, bytes, depth, accessed, modified, basename, extension, type, mode, parentpath, and fullpath. This way you can restrict which files are searched without having to join tables.
An example SQLite database (Mac data), database.sqlite, has been uploaded for you to play with. It includes an example files table...
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Scientists at the National Center for Atmospheric Research have recently carried out several experiments to better understand the uncertainties associated with future climate projections. In particular, the NCAR Climate and Global Dynamics Lab (CGDL) working group has completed a large Parameter Perturbation Experiment (PPE) utilizing the Community Land Model (CLM), testing the effects of 32 parameters over thousands of simulations over a range of 250 years. The CLM model experiment is focused on understanding uncertainty around biogeophysical parameters that influence the balance of chemical cycling and sequestration variables. The current website for displaying model results is not intuitive or informative to the broader scientific audience or the general public. The goal of this project is to develop an improved data visualization dashboard for communicating the results of the CLM PPE. The interactive dashboard would provide an interface where new or experienced users can query the experiment database to ask which environmental processes are affected by a given model parameter, or vice versa. Improving the accessibility of the data will allow professionals to use the most recent land parameter data when evaluating the impact of a policy or action on climate change. Methods Data Source:
University of California, Santa Barbara – Climate and Global Dynamics Lab, National Center for Atmospheric Research: Parameter Perturbation Experiment (CGD NCAR PPE-5). https://webext.cgd.ucar.edu/I2000/PPEn11_OAAT/ (Only public version of the data currently accessible. Data leveraged in this project is currently stored on the NCAR server and is not publicly available), https://www.cgd.ucar.edu/events/seminar/2023/katie-dagon-and-daniel-kennedy-132940 (Learn more about this complex data via this amazing presentation by Katie Dragon & Daniel Kennedy ^) The Parameter Perturbation Experiment data leveraged by our project was generated utilizing the Community Land Model v5 (CLM5) predictions. https://www.earthsystemgrid.org/dataset/ucar.cgd.ccsm4.CLM_LAND_ONLY.html
Data Processing:
We were working inside of NCAR’s CASPER cluster HPC server, this enabled us direct access to the raw data files. We created a script to read in 500 LHC PPE simulations as a data set with inputs for a climate variable and time range. When reading in the cluster of simulations, there is a preprocess function that performs dimensional reduction to simplify the data set for wrangling later.
Once the data sets of interest were loaded, they were then ready for some dimensional corrections – some quirks that come with using CESM data. Our friend’s at NCAR CGDL actually provided us with the correct time-paring bug. The other functions to weigh each grid cell by land area, properly weigh each month according to their contribution to the number of days in a year, and to calculate the global average of each simulation were generated by our team to wrangle the data so it is suitable for emulation. These files were saved so they could be leveraged later using a built-in if-else statement within the read_n_wrangle()
function.
The preprocessed data is then used in the GPR ML Emulator to make 100 predictions for a climate variable of interest and 32 individual parameters. To summarize briefly without getting too into the nitty gritty, our GPR emulator does 3 things:
Simplifies the LHC data so it can look at 1 parameter at a time and assess its relationship with a climate variable. Applies Fourier Amplitude Sensitivity Analysis to identify relationships between parameters and climate variables. It helps us see what the key influencers are. In the full chaotic LHC, it can assess the covariance of the parameter-parameter predictions simultaneously (this is the R^2 value you’ll see on your accuracy inset plot later)
Additionally, it ‘pickles’ and saves the predictions and trained gpr_model so they can be utilized for further analysis, exploration, and visualizations. Attributes and structures defined in this notebook outlines the workflow utilized to generate the data in this repo. It pulls functions from this utils.py to execute the desired commands. Below we will look at the utils.py functions that are not explicitly defined in the notebook. – General side note: if you decide to explore that Attributes and structures defined in this notebook explaining how the data was made, you’ll notice you’ll be transported to another repo in this Organization: GaiaFuture. That’s our prototype playground! It’s a little messy because that’s where we spent the second half of this project tinkering. The official repository is https://github.com/GaiaFuture/CLM5_PPE_Emulator.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
This dataset contains meta-mathematics questions and answers collected from the Mistral-7B question-answering system. The responses, types, and queries are all provided in order to help boost the performance of MetaMathQA while maintaining high accuracy. With its well-structured design, this dataset provides users with an efficient way to investigate various aspects of question answering models and further understand how they function. Whether you are a professional or beginner, this dataset is sure to offer invaluable insights into the development of more powerful QA systems!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Data Dictionary
The MetaMathQA dataset contains three columns: response, type, and query. - Response: the response to the query given by the question answering system. (String) - Type: the type of query provided as input to the system. (String) - Query:the question posed to the system for which a response is required. (String)
Preparing data for analysis
It’s important that before you dive into analysis, you first familiarize yourself with what kind data values are present in each column and also check if any preprocessing needs to be done on them such as removing unwanted characters or filling in missing values etc., so that it can be used without any issue while training or testing your model further down in your process flow.
##### Training Models using Mistral 7B
Mistral 7B is an open source framework designed for building machine learning models quickly and easily from tabular (csv) datasets such as those found in this dataset 'MetaMathQA ' . After collecting and preprocessing your dataset accordingly Mistral 7B provides with support for various Machine Learning algorithms like Support Vector Machines (SVM), Logistic Regression , Decision trees etc , allowing one to select from various popular libraries these offered algorithms with powerful overall hyperparameter optimization techniques so soon after selecting algorithm configuration its good practice that one use GridSearchCV & RandomSearchCV methods further tune both optimizations during model building stages . Post selection process one can then go ahead validate performances of selected models through metrics like accuracy score , F1 Metric , Precision Score & Recall Scores .
##### Testing phosphors :
After successful completion building phase right way would be robustly testing phosphors on different evaluation metrics mentioned above Model infusion stage helps here immediately make predictions based on earlier trained model OK auto back new test cases presented by domain experts could hey run quality assurance check again base score metrics mentioned above know asses confidence value post execution HHO updating baseline scores running experiments better preferred methodology AI workflows because Core advantage finally being have relevancy inexactness induced errors altogether impact low
- Generating natural language processing (NLP) models to better identify patterns and connections between questions, answers, and types.
- Developing understandings on the efficiency of certain language features in producing successful question-answering results for different types of queries.
- Optimizing search algorithms that surface relevant answer results based on types of queries
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:--------------|:------------------------------------| | response | The response to the query. (String) | | type | The type of query. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
DEEPEN stands for DE-risking Exploration of geothermal Plays in magmatic ENvironments. Part of the DEEPEN project involved developing and testing a methodology for a 3D play fairway analysis (PFA) for multiple play types (conventional hydrothermal, superhot EGS, and supercritical). This was tested using new and existing geoscientific exploration datasets at Newberry Volcano. This GDR submission includes images, data, and models related to the 3D favorability and uncertainty models and the 2D favorability and uncertainty maps. The DEEPEN PFA Methodology is based on the method proposed by Poux et al. (2020), which uses the Leapfrog Geothermal software with the Edge extension to conduct PFA in 3D. This method uses all available data to build a 3D geodata model which can be broken down into smaller blocks and analyzed with advanced geostatistical methods. Each data set is imported into a 3D model in Leapfrog and divided into smaller blocks. Conditional queries can then be used to assign each block an index value which conditionally ranks each block's favorability, from 0-5 with 5 being most favorable, for each model (e.g., lithologic, seismic, magnetic, structural). The values between 0-5 assigned to each block are referred to as index values. The final step of the process is to combine all the index models to create a favorability index. This involves multiplying each index model by a given weight and then summing the resulting values. The DEEPEN PFA Methodology follows this approach, but split up by the specific geologic components of each play type. These components are defined as follows for each magmatic play type: 1. Conventional hydrothermal plays in magmatic environments: Heat, fluid, and permeability 2. Superhot EGS plays: Heat, thermal insulation, and producibility (the ability to create and sustain fractures suitable for and EGS reservoir) 3. Supercritical plays: Heat, supercritical fluid, pressure seal, and producibility (the proper permeability and pressure conditions to allow production of supercritical fluid) More information on these components and their development can be found in Kolker et al., 2022. For the purposes of subsurface imaging, it is easier to detect a permeable fluid-filled reservoir than it is to detect separate fluid and permeability components. Therefore, in this analysis, we combine fluid and permeability for conventional hydrothermal plays, and supercritical fluid and producibility for supercritical plays. More information on this process is described in the following sections. We also project the 3D favorability volumes onto 2D surfaces for simplified joint interpretation, and we incorporate an uncertainty component. Uncertainty was modeled using the best approach for the dataset in question, for the datasets where we had enough information to do so. Identifying which subsurface parameters are the least resolved can help qualify current PFA results and focus future efforts in data collection. Where possible, the resulting uncertainty models/indices were weighted using the same weights applied to the respective datasets, and summed, following the PFA methodology above, but for uncertainty. There are two different versions of the Leapfrog model and associated favorability models: - v1.0: The first release in June 2023 - v2.1: The second release, with improvements made to the earthquake catalog (included additional identified events, removed duplicate events), to the temperature model (fixed a deep BHT), and to the index models (updated the seismicity-heat source index models for supercritical and EGS, and the resistivity-insulation index models for all three play types). Also uses the jet color map rather than the magma color map for improved interpretability. - v2.1.1: Updated to include v2.0 uncertainty results (see below for uncertainty model versions) There are two different versions of the associated uncertainty models: - v1.0: The first release in June 2023 - v2.0: The second release, with improvements made to the temperature and fault uncertainty models. ** Note that this submission is deprecated and that a newer submission, linked below and titled "DEEPEN Final 3D PFA Favorability Models and 2D Favorability Maps at Newberry Volcano" contains the final versions of these resources. **
The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.
Asset database for the Hunter subregion on 24 February 2016 (V2.5) supersedes the previous version of the HUN Asset database V2.4 (Asset database for the Hunter subregion on 20 November 2015, GUID: 0bbcd7f6-2d09-418c-9549-8cbd9520ce18). It contains the Asset database (HUN_asset_database_20160224.mdb), a Geodatabase version for GIS mapping purposes (HUN_asset_database_20160224_GISOnly.gdb), the draft Water Dependent Asset Register spreadsheet (BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20160224.xlsx), a data dictionary (HUN_asset_database_doc_20160224.doc), and a folder (NRM_DOC) containing documentation associated with the Water Asset Information Tool (WAIT) process as outlined below. This version should be used for Materiality Test (M2) test.
The Asset database is registered to the BA repository as an ESRI personal goedatabase (.mdb - doubling as a MS Access database) that can store, query, and manage non-spatial data while the spatial data is in a separate file geodatabase joined by AID/ElementID.
Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. A report on the WAIT process for the Hunter is included in the zip file as part of this dataset.
Elements are initially included in the preliminary assets database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet the second Materiality Test (M2). Assets meeting both Materiality Tests comprise the water dependent asset list. Descriptions of the assets identified in the Hunter subregion are found in the "AssetList" table of the database.
Assets are the spatial features used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy, as described in Appendix A of "HUN_asset_database_doc_20160224.doc ", located in this filet.
The "Element_to_Asset" table contains the relationships and identifies the elements that were grouped to create each asset.
Detailed information describing the database structure and content can be found in the document "HUN_asset_database_doc_20160224.doc" located in this file.
Some of the source data used in the compilation of this dataset is restricted.
The public version of this asset database can be accessed via the following dataset: Asset database for the Hunter subregion on 24 February 2016 Public 20170112 v02 (https://data.gov.au/data/dataset/9d16592c-543b-42d9-a1f4-0f6d70b9ffe7)
OBJECTID VersionID Notes Date_
1 1 Initial database. 29/08/2014
3 1.1 Update the classification for seven identical assets from Gloucester subregion 16/09/2014
4 1.2 Added in NSW GDEs from Hunter - Central Rivers GDE mapping from NSW DPI (50 635 polygons). 28/01/2015
5 1.3 New AIDs assiged to NSW GDE assets (Existing AID + 20000) to avoid duplication of AIDs assigned in other databases. 12/02/2015
6 1.4 "(1) Add 20 additional datasets required by HUN assessment project team after HUN community workshop
(2) Turn off previous GW point assets (AIDs from 7717-7810 inclusive)
(3) Turn off new GW point asset (AID: 0)
(4) Assets (AIDs: 8023-8026) are duplicated to 4 assets (AID: 4747,4745,4744,4743 respectively) in NAM subregion . Their AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using
values from that NAM assets.
(5) Asset (AID 8595) is duplicated to 1 asset ( AID 57) in GLO subregion . Its AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using values from that GLO assets.
(6) 39 assets (AID from 2969 to 5040) are from NAM Asset database and their attributes were updated to use the latest attributes from NAM asset database
(7)The databases, especially spatial database, were changed such as duplicated attributes fields in spatial data were removed and only ID field is kept. The user needs to join the Table Assetlist or Elementlist to
the spatial data" 16/06/2015
7 2 "(1) Updated 131 new GW point assets with previous AID and some of them may include different element number due to the change of 77 FTypes requested by Hunter assessment project team
(2) Added 104 EPBC assets, which were assessed and excluded by ERIN
(3) Merged 30 Darling Hardyhead assets to one (asset AID 60140) and deleted another 29
(4) Turned off 5 assets from community workshop (60358 - 60362) as they are duplicated to 5 assets from 104 EPBC excluded assets
(5) Updated M2 test results
(6) Asset Names (AID: 4743 and 4747) were changed as requested by Hunter assessment project team (4 lower cases to 4 upper case only). Those two assets are from Namoi asset database and their asset names
may not match with original names in Namoi asset database.
(7)One NSW WSP asset (AID: 60814) was added in as requested by Hunter assessment project team. The process method (without considering 1:M relation) for this asset is not robust and is different to other NSW
WSP assets. It should NOT use for other subregions.
(8) Queries of Find_All_Used_Assets and Find_All_WD_Assets in the asset database can be used to extract all used assts and all water dependant assts" 20/07/2015
8 2.1 "(1) There are following six assets (in Hun subregion), which is same as 6 assets in GIP subregion. Their AID, Asset Name, Group, SubGroup, Depth, Source and ListDate are using values from GIP assets. You will
not see AIDs from AID_from_HUN in whole HUN asset datable and spreadsheet anymore and you only can see AIDs from AID_from_GIP ( Actually (a) AID 11636 is GIP got from MBC (B) only AID, Asset Name
and ListDate are different and changed)
(2) For BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx, (a) Extracted long ( >255 characters) WD rationale for 19 assets (AIDs:
8682,9065,9073,9087,9088,9100,9102,9103,60000,60001,60792,60793,60801,60713,60739,60751,60764,60774,60812 ) in tab "Water-dependent asset register" and 37 assets (AIDs:
5040,8651,8677,8682,8650,8686,8687,8718,8762,9094,9065,9067,9073,9077,9081,9086,9087,9088,9100,9102,9103,60000,60001,60739,60742,60751,60713,60764,60771,
60774,60792,60793,60798,60801,60809,60811,60812) in tab "Asset list" in 1.30 Excel file (b) recreated draft BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx
(3) Modified queries (Find_All_Asset_List and Find_Waterdependent_asset_register) for (2)(a)" 27/08/2015
9 2.2 "(1) Updated M2 results from the internal review for 386 Sociocultural assets
(2)Updated the class to Ecological/Vegetation/Habitat (potential species distribution) for assets/elements from sources of WAIT_ALA_ERIN, NSW_TSEC, NSW_DPI_Fisheries_DarlingHardyhead" 8/09/2015
10 2.3 "(1) Updated M2 results from the internal review
\* Changed "Assessment team do not say No" to "All economic assets are by definition water dependent"
\* Changed "Assessment team say No" : to "These are water dependent, but excluded by the project team based on intersection with the PAE is negligible"
\* Changed "Rivertyles" to "RiverStyles"" 22/09/2015
11 2.4 "(1) Updated M2 test results for 86 assets from the external review
(2) Updated asset names for two assets (AID: 8642 and 8643) required from the external review
(3) Created Draft Water Dependent Asset Register file using the template V5" 20/11/2015
12 2.5 "Total number of registered water assets was increased by 1 (= +2-1) due to:
Two assets changed M2 test from "No" to "Yes" , but one asset assets changed M2 test from "Yes" to "No"
from the review done by Ecologist group." 24/02/2016
Bioregional Assessment Programme (2015) Asset database for the Hunter subregion on 24 February 2016. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/a39290ac-3925-4abc-9ecb-b91e911f008f.
Derived From GW Element Bores with Unknown FTYPE Hunter NSW Office of Water 20150514
Derived From Travelling Stock Route Conservation Values
Derived From Spatial Threatened Species and Communities (TESC) NSW 20131129
Derived From NSW Wetlands
Derived From Climate Change Corridors Coastal North East NSW
*
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is based on the model developed with the Ph.D. students of the Communication and Information Sciences Ph.D. program at the University of Hawaii at Manoa, intended to help new students get relevant information. The model was first presented at the iConference 2023, in a paper "Community Design of a Knowledge Graph to Support Interdisciplinary Ph.D. Students " by Stanislava Gardasevic and Rich Gazan (available at: https://scholarspace.manoa.hawaii.edu/server/api/core/bitstreams/9eebcea7-06fd-4db3-b420-347883e6379e/content)The database is created in Neo4J, and the .dump file can be imported to the cloud instance of this software. The dataset (.dump) contains publically available data collected from multiple web locations and indexes of the sample of publications from the people in this domain. Except for that, it contains my (first author's) personal graph demonstrating progress through a student's program in this degree, and activities they have done while in the program. This dataset was made possible with the huge help of my collaborator, Petar Popovic, who ingested the data in the database.The model and dataset were developed while involving the end users in the design and are based on the actual information needs of a population. It is intended to allow researchers to investigate multigraph visualization of the data modeled by the said model.The knowledge graph was evaluated with CIS student population, and the study results show that it is very helpful for decision-making, information discovery, and identification of people in one's surroundings who might be good collaborators or information points. We provide the .json file containing the Neo4J Bloom perspective with styling and queries used in these evaluation sessions.
This dataset has recent, preliminary (not quality-controlled), 1-minute, water level (tide) data from NOAA NOS Center for Operational Oceanographic Products and Services (CO-OPS). WARNING: These raw data have not been subjected to the National Ocean Service's quality control or quality assurance procedures and do not meet the criteria and standards of official National Ocean Service data. They are released for limited public use as preliminary data to be used only with appropriate caution. WARNING: * Queries for data MUST include stationID=, datum=, and time>=. * Queries for data USUALLY include time<=. * Queries MUST be for less than 30 days worth of data. The default time<= value corresponds to 'now'. * Different stations support different datums. Use ERDDAP's Subset web page to find out which datums a given station supports. * The data source isn't completely reliable. If your request returns no data when you think it should: * Make sure the station you specified supports the datum you specified. * Try revising the request (e.g., a different datum or a different time range). * The list of stations offering this data (or the list of datums) may be incorrect. * Sometimes a station or the entire data service is unavailable. Wait a while and try again.
Quadrant provides Insightful, accurate, and reliable mobile location data.
Our privacy-first mobile location data unveils hidden patterns and opportunities, provides actionable insights, and fuels data-driven decision-making at the world's biggest companies.
These companies rely on our privacy-first Mobile Location and Points-of-Interest Data to unveil hidden patterns and opportunities, provide actionable insights, and fuel data-driven decision-making. They build better AI models, uncover business insights, and enable location-based services using our robust and reliable real-world data.
We conduct stringent evaluations on data providers to ensure authenticity and quality. Our proprietary algorithms detect, and cleanse corrupted and duplicated data points – allowing you to leverage our datasets rapidly with minimal processing or cleaning. During the ingestion process, our proprietary Data Filtering Algorithms remove events based on a number of both qualitative factors, as well as latency and other integrity variables to provide more efficient data delivery. The deduplicating algorithm focuses on a combination of four important attributes: Device ID, Latitude, Longitude, and Timestamp. This algorithm scours our data and identifies rows that contain the same combination of these four attributes. Post-identification, it retains a single copy and eliminates duplicate values to ensure our customers only receive complete and unique datasets.
We actively identify overlapping values at the provider level to determine the value each offers. Our data science team has developed a sophisticated overlap analysis model that helps us maintain a high-quality data feed by qualifying providers based on unique data values rather than volumes alone – measures that provide significant benefit to our end-use partners.
Quadrant mobility data contains all standard attributes such as Device ID, Latitude, Longitude, Timestamp, Horizontal Accuracy, and IP Address, and non-standard attributes such as Geohash and H3. In addition, we have historical data available back through 2022.
Through our in-house data science team, we offer sophisticated technical documentation, location data algorithms, and queries that help data buyers get a head start on their analyses. Our goal is to provide you with data that is “fit for purpose”.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data release provides the reanalysis streamflow data from versions 1.2, 2.0, and 2.1 of the National Water Model structured for timeseries extraction. The impact of this is that user can query time series for a given NHDPlusV2 COMID without downloading the hourly CONUS files and extracting the sample of relevant values.
The data is hosted on the RENCI THREDDS Data Server and is accessible via OPeNDAP at the follwoing URLs:
Version 1.2 (https://thredds.hydroshare.org/thredds/catalog/nwm/retrospective/catalog.html?dataset=NWM_Retrospective/nwm_retro_full.ncml) - Spans 1993-01-01 00:00:00 to 2017-12-31 23:00:00 - Contains 219,144 hourly time steps for - 2,729,077 NHD reaches
Version 2.0 (https://thredds.hydroshare.org/thredds/catalog/nwm/retrospective/catalog.html?dataset=NWM_Retrospective/nwm_v2_retro_full.ncml) - Spans 1993-01-01 00:00:00 to 2018-12-31 00:00:00 - Contains 227,903 hourly time steps for - 2,729,076 NHD reaches
Version 2.1 (https://cida.usgs.gov/thredds/catalog/demo/morethredds/nwm/nwm_v21_retro_full.ncml) - Spans 1979-02-02 18:00:00 to 2020-12-31 00:00:00 - Contains 227,903 hourly time steps for - 2,729,076 NHD reaches
Raw Data (https://registry.opendata.aws/nwm-archive/) - 227,000+ hourly netCDF files (depending on version)
The data description structure (DDS) can be viewed at the NcML page for each respective resource (linked above). More broadly each resource includes:
The nwmTools
R package provides easier interaction with the OPeNDAP resources. Package documentation can be found here and the GitHub repository here.
This effort is supported by the Consortium of Universities for the Advancement of Hydrologic Science, Inc. under the HydroInformatics Fellowship. See program here
J.M. Johnson, David L. Blodgett, Keith C. Clarke, Jon Pollack. (2020). "Restructuring and serving web-accessible streamflow data from the NOAA National Water Model historic simulations". Nature Scienfic Data. (In Review)
Part of the DEEPEN (DE-risking Exploration of geothermal Plays in magmatic ENvironments) project involved developing and testing a methodology for a 3D play fairway analysis (PFA) for multiple play types (conventional hydrothermal, superhot EGS, and supercritical). This was tested using new and existing geoscientific exploration datasets at Newberry Volcano. This GDR submission includes images, data, and models related to the 3D favorability and uncertainty models and the 2D favorability and uncertainty maps. The DEEPEN PFA Methodology, detailed in the journal article below, is based on the method proposed by Poux & O'brien (2020), which uses the Leapfrog Geothermal software with the Edge extension to conduct PFA in 3D. This method uses all available data to build a 3D geodata model which can be broken down into smaller blocks and analyzed with advanced geostatistical methods. Each data set is imported into a 3D model in Leapfrog and divided into smaller blocks. Conditional queries can then be used to assign each block an index value which conditionally ranks each block's favorability, from 0-5 with 5 being most favorable, for each model (e.g., lithologic, seismic, magnetic, structural). The values between 0-5 assigned to each block are referred to as index values. The final step of the process is to combine all the index models to create a favorability index. This involves multiplying each index model by a given weight and then summing the resulting values. The DEEPEN PFA Methodology follows this approach, but split up by the specific geologic components of each play type. These components are defined as follows for each magmatic play type: 1. Conventional hydrothermal plays in magmatic environments: Heat, fluid, and permeability 2. Superhot EGS plays: Heat, thermal insulation, and producibility (the ability to create and sustain fractures suitable for and EGS reservoir) 3. Supercritical plays: Heat, supercritical fluid, pressure seal, and producibility (the proper permeability and pressure conditions to allow production of supercritical fluid) More information on these components and their development can be found in Kolker et al., (2022). For the purposes of subsurface imaging, it is easier to detect a permeable fluid-filled reservoir than it is to detect separate fluid and permeability components. Therefore, in this analysis, we combine fluid and permeability for conventional hydrothermal plays, and supercritical fluid and producibility for supercritical plays. We also project the 3D favorability volumes onto 2D surfaces for simplified joint interpretation, and we incorporate an uncertainty component. Uncertainty was modeled using the best approach for the dataset in question, for the datasets where we had enough information to do so. Identifying which subsurface parameters are the least resolved can help qualify current PFA results and focus future efforts in data collection. Where possible, the resulting uncertainty models/indices were weighted using the same weights applied to the respective datasets, and summed, following the PFA methodology above, but for uncertainty.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This package contains data on five text analysis types (term extraction, contract analysis, topic modeling, network mapping), based on the survey data where researchers selected research output that are related to the 17 Sustainable Development Goals (SDGs). This is used as input to improve the current SDG classification model v4.0 to v5.0
Sustainable Development Goals are the 17 global challenges set by the United Nations. Within each of the goals specific targets and indicators are mentioned to monitor the progress of reaching those goals by 2030. In an effort to capture how research is contributing to move the needle on those challenges, we earlier have made an initial classification model than enables to quickly identify what research output is related to what SDG. (This Aurora SDG dashboard is the initial outcome as proof of practice.)
The initiative started from the Aurora Universities Network in 2017, in the working group "Societal Impact and Relevance of Research", to investigate and to make visible 1. what research is done that are relevant to topics or challenges that live in society (for the proof of practice this has been scoped down to the SDGs), and 2. what the effect or impact is of implementing those research outcomes to those societal challenges (this also have been scoped down to research output being cited in policy documents from national and local governments an NGO's).
Context of this dataset | classification model improvement workflow
The classification model we have used are 17 different search queries on the Scopus database.
Methods used to do the text analysis
Software used to do the text analyses
CorTexT: The CorTexT Platform is the digital platform of LISIS Unit and a project launched and sustained by IFRIS and INRAE. This platform aims at empowering open research and studies in humanities about the dynamic of science, technology, innovation and knowledge production.
Resource with interactive visualisations
Based on the text analysis data we have created a website that puts all the SDG interactive diagrams together. For you to scrall through. https://sites.google.com/vu.nl/sdg-survey-analysis-results/
Data set content
In the dataset root you'll find the following folders and files:
Inside an /sdg01-17/-folder you'll find the following:
note: the .csv files are actually tab-separated.
Contribute and improve the SDG Search Queries
We welcome you to join the Github community and to fork, branch, improve and make a pull request to add your improvements to the new version of the SDG queries. https://github.com/Aurora-Network-Global/sdg-queries