Dataset contains 613 news of CanalUGR (University of Granada Communication Office) tracked on the main online news aggregators (Google News, Yahoo! News and Bing News). We include: number in CanalUGR, media, country, type.
Very high resolution (VHR) satellite imagery from Google Earth and Microsoft Bing Maps is increasingly being used in a variety of applications from computer sciences to arts and humanities. In the field of remote sensing, one use of this imagery is to create reference data sets through visual interpretation, e.g., to complement existing training data or to aid in the validation of land-cover products. Through new applications such as Collect Earth, this imagery is also being used for monitoring purposes in the form of statistical surveys obtained through visual interpretation. However, little is known about where VHR satellite imagery exists globally or the dates of the imagery. Here we present a global overview of the spatial and temporal distribution of VHR satellite imagery in Google Earth and Microsoft Bing Maps. The results show an uneven availability globally, with biases in certain areas such as the USA, Europe and India, and with clear discontinuities at political borders. We also show that the availability of VHR imagery is currently not adequate for monitoring protected areas and deforestation, but is better suited for monitoring changes in cropland or urban areas using visual interpretation Note: (1) Information on growing and non-growing seasons has been derived from the remote sensing product: https://lpdaac.usgs.gov/dataset_discovery/measures/measures_products_table/vipphen_ndvi_v004(2) Google provides full global coverage by images, in contrast to Bing. However, in many areas, these are Landsat-based images (from 1984 up to now). For more objective comparison with Bing imagery, we have excluded those areas from the analysis. Supplement to: Lesiv, Myroslava; See, Linda; Laso-Bayas, Juan-Carlos; Sturn, Tobias; Schepaschenko, Dmitry; Karner, Mathias; Moorthy, Inian; McCallum, Ian; Fritz, Steffen (2018): Characterizing the Spatial and Temporal Availability of Very High Resolution Satellite Imagery in Google Earth and Microsoft Bing Maps as a Source of Reference Data. Land, 7(4), 118
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in
Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.
Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)
The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles.
Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.
To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.
Dataset 2: Search Query Suggestions (suggestions.csv)
The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.
The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".
We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.
AllSides Scraper
At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.
We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The current dataset is consisted of 200 search results extracted from Google and Bing engines (100 of Google and 100 of Bing). The search terms are selected from the 10 most search keywords of 2021 based on the provided data of Google Trends. The rest of the sheets include the performance of the websites according to three technical evaluation aspects. That is, SEO, Speed and Security. The performance dataset has been developed through the utilization of CheckBot crawling tool. The whole dataset can help information retrieval scientists to compare the two engines in terms of their position/ranking and their performance related to these factors.
For more information about the thinking of the of the structure of the dataset please contact the Information Management Lab of University of West Attica.
Contact Persons: Vasilis Ntararas (lb17032@uniwa.gr) , Georgios Ntimo (lb17100@uniwa.gr) and Ioannis C. Drivas (idrivas@uniwa.gr)
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data is usually updated quarterly by February 1st, May 1st, August 1st, and November 1st.The CEC Power Plant geospatial data layer contains point features representing power generating facilities in California, and power plants with imported electricity from Nevada, Arizona, Utah and Mexico.The transmission line, substation and power plant mapping database were started in 1990 by the CEC GIS staffs. The final project was completed in October 2010. The enterprise GIS system on CEC's critical infrastructure database was leaded by GIS Unit in November 2014 and was implemented in May 2016. The data was derived from CEC's Quarterly Fuel and Energy Report (QFER), Energy Facility Licensing (Siting), Wind Performance Reporting System (WPRS), and Renewable Energy Action Team (REAT). The sources for the power plant point digitizing are including sub-meter resolution of Digital Globe, Bing, Google, ESRI and NAIP aerial imageries, with scale at least 1:10,000. Occasionally, USGS Topographic map, Google Street View and Bing Bird's Eye are used to verify the precise location of a facility.Although a power plant may have multiple generators, or units, the power plant layer represents all units at a plant as one feature. Detailed attribute information associated with the power plant layer includes CEC Plant ID, Plant Label, Plant Capacity (MW), General Fuel, Plant Status, CEC Project Status, CEC Docket ID, REAT ID, Plant County, Plant State, Renewable Energy, Wind Resource Area, Local Reliability Area, Sub Area, Electric Service Area, Service Area Category, California Balancing Authorities, California Air District, California Air Basin, Quad Name, Senate District, Assembly District, Congressional District, Power Project Web Link, CEC Link, Aerial, QRERGEN Comment, WPRS Comment, Geoscience Comment, Carto Comment, QFERGEN Excel Link, WPRS Excel Link, Schedule 3 Excel Link, and CEC Data Source. For power plant layer which is joined with QFer database, additional fields are displayed: CEC Plant Name (full name), Plant Alias, EIA Plant ID, Plant City, Initial Start Date, Online Year, Retire Date, Generator or Turbine Count, RPS Eligible, RPS Number, Operator Company Name, and Prime Mover ID. In general, utility and non-utility operated power plant spatial data with at least 1 MW of demonstrated capacity and operating status are distributed. Special request is required on power plant spatial data with all capacities and all stages of status, including Cold Standby, Indefinite Shutdown, Maintenance, Non-Operational, Proposed, Retired, Standby, Terminated, and Unknown.For question on power generation or others, please contact Michael Nyberg at (916) 654-5968.California Energy Commission's Open Data Portal.
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
This repository contains the News400 dataset introduced in the paper:
Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, and Ralph Ewerth. 2020. Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency. In Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR '20). Association for Computing Machinery, New York, NY, USA, 16–25. DOI: https://doi.org/10.1145/3372278.3390670
dataset.jsonl
containing:
<entity>.jsonl
file for each entity type containing the following information for each entity:
The source code to reproduce our results as well as download scripts to crawl news texts and images can be found on our GitHub page: https://github.com/TIBHannover/cross-modal_entity_consistency
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
This repository contains the TamperedNews dataset introduced in the paper:
Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, and Ralph Ewerth. 2020. Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency. In Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR '20). Association for Computing Machinery, New York, NY, USA, 16–25. DOI: https://doi.org/10.1145/3372278.3390670
dataset.jsonl
containing:
entity_type.jsonl
file for each entity type containing the following information for each entity:
The source code to reproduce our results as well as download scripts to crawl news texts and images can be found on our GitHub page: https://github.com/TIBHannover/cross-modal_entity_consistency
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data in this dataset is a spatial inventory of urban agriculture (UA) carried out in the city of Milan (Italy). UA areas where identified with a multi-step and iterative procedure by using different web-mapping tools, especially multitemporal Google Earth images, and ancillary data such as Google Street View and Bing Maps.
License
Creative Commons CC-BY
Disclaimer
Despite our best efforts to validate the data, some information may be incorrect.
Description of the dataset
Typologies of UA
Land use typologies
Credit
Pulighe G., Lupia F. (2019) Multitemporal Geospatial Evaluation of Urban Agriculture and (Non)-Sustainable Food Self-Provisioning in Milan, Italy. Sustainability 2019, 11(7), 1846
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset contains citations from USPTO patents granted 1947-2018 to articles captured by the Microsoft Academic Graph (MAG) from 1800-2018. If you use the data, please cite these two papers:
for the dataset of citations: Marx, Matt and Aaron Fuegi, "Reliance on Science in Patenting: USPTO Front-Page Citations to Scientific Articles" (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3331686)
for the underlying dataset of papers Sinha, Arnab, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion). ACM, New York, NY, USA, 243-246.
The main file, pcs.tsv, contains the resolved citations. Fields are tab-separated. Each match has the patent number, MAG ID, the original citation from the patent, an indicator for whether the citation was supplied by the applicant, examiner, or unknown, and a confidence score (1-10) indicating how likely this match is correct. Note that this distribution does not contain matches with confidence 2 or 1.
There is also a PubMed-specific match in pcs-pubmed.tsv.
The remaining files are a redistribution of the 1 January 2019 release of the Microsoft Academic Graph. All of these files are compressed using ZIP compression under CentOS5. Original files, documented at https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema, can be downloaded from https://aka.ms/msracad; this redistribution carves up the original files into smaller, variable-specific files that can be loaded individually (see _relianceonscience.pdf for full details).
Source code for generating the patent citations to science in pcs.tsv is available at https://github.com/mattmarx/reliance_on_science. Source code for generating jif.zip and jcif.zip (Journal Impact Factor and Journal Commercial Impact Factor) is at https://github.com/mattmarx/jcif.
Although MAG contains authors and affiliations for each paper, it does not contain the location for affiliations. We have created a dataset of locations for affiliations appearing at least 100x using Bing Maps and Google Maps; however, it is unclear to us whether the API licensing terms allow us to repost their data. In any case, you can download our source code for doing so here: https://github.com/ksjiaxian/api-requester-locations.
MAG extracts field keywords for each paper (paperfieldid.zip and fieldidname.zip) --more than 200,000 fields in all! When looking to study industries or technical areas you might find this a bit overwhelming. We mapped the MAG subjects to six OECD fields and 39 subfields, defined here: http://www.oecd.org/science/inno/38235147.pdf. Clarivate provides a crosswalk between the OECD classifications and Web of Science fields, so we include WoS fields as well. This file is magfield_oecd_wos_crosswalk.zip.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Data Analysis Agent Meta and Traffic Dataset in AI Agent Marketplace | AI Agent Directory | AI Agent Index from DeepNLP
This dataset is collected from AI Agent Marketplace Index and Directory at http://www.deepnlp.org, which contains AI Agents's meta information such as agent's name, website, description, as well as the monthly updated Web performance metrics, including Google,Bing average search ranking positions, Github Stars, Arxiv References, etc. The dataset is helpful for AI… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/data-analysis-ai-agent.
Research Data Australia is an Internet-based collection designed to promote visibility of Australian research data in search engines such as Google and Bing. Research Data Australia aims to provide a comprehensive window into the Australian Research Data Commons. It provides connections between data, projects, researchers and services across organisations and discipline
Research is producing larger and more complex data than ever before. It is imperative that these data outputs are effectively managed and shared. Better data – better described, more connected, more integrated and organised, more accessible, more easily used for new purposes – allows new questions to be investigated, larger issues to be investigated, and data landscapes to be explored.
The 2015 edition of the Land Use keeps the classification unchanged and all the classes of the level have been updated. Updated polygons show the date of the modification. The natural color aerial images of 2013 and infrared of 2010, provided by AGEA and the high resolution images of Google Earth and Bing Maps 2015 were analysed. The Land Use update sc. 1:10000 - Ed. 2018. - Coverage - Entire Regional Territory - Origin - Photoanalysis and photointerpretation of Agea 2013 digital aerial images and high resolution Google Earth and Bing Maps 2015 satellite images.
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
This repository contains the TamperedNews and News400 datasets introduced in the paper:
Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, Sherzod Hakimov und Ralph Ewerth. „Multimodal news analytics using measures of cross-modal entity and context consistency“. In: International Journal of Multimedia Information Retrieval 10.2 (2021), Springer, S. 111–125. DOI: https://doi.org/10.1007/s13735-021-00207-4
For both datasets TamperedNews and News400, we provide the:
*dataset*.tar.gz
containing the *dataset*.jsonl
with
*dataset*_features.tar.gz
with visual features for events, locations, and personsnews400_wordembeddings.tar.gz
: Word embeddings of all nouns in the news texts of the News400 datasetPlease note that the word embeddings of the TamperedNews dataset (tamperednews_wordembeddings.tar.gz
) have been already provided in the first version (Link).
For all entities detected in both datasets, we provide:
entities.tar.gz
containing an *entity_type*.jsonl
for all entity types (events, locations, and persons) with:
entities_features.tar.gz
containing the visual features of the reference images for all entitiesThe source code to reproduce our results as well as download scripts to crawl news texts and images can be found on our GitHub page: https://github.com/TIBHannover/cross-modal_entity_consistency
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AbstractForests of Australia (2023) is a continental spatial dataset of forest extent, by national forest categories and types, assembled for Australia's State of the Forests Report. It was developed from multiple forest, vegetation and land cover data inputs, including contributions from Australian, state and territory government agencies and external sources.A forest is defined in this dataset as "An area, incorporating all living and non-living components, that is dominated by trees having usually a single stem and a mature or potentially mature stand height exceeding two metres and with existing or potential crown cover of overstorey strata about equal to or greater than 20 per cent. This includes Australia's diverse native forests and plantations, regardless of age. It is also sufficiently broad to encompass areas of trees that are sometimes described as woodlands".The dataset was compiled by the Australian Bureau of Agricultural and Resource Economics and Sciences (ABARES) for the National Forest Inventory (NFI), a collaborative partnership between the Australian and state and territory governments. The role of the NFI is to collate, integrate and communicate information on Australia's forests. State and territory government agencies collect forest data using independent methods and at varying scales or resolutions. The NFI applies a national classification to state and territory data to allow seamless integration of these datasets. Multiple independent sources of external data are used to fill data gaps and improve the quality of the final dataset.The NFI classifies forests into three national forest categories (Native Forest, Commercial plantation, and other forest) and then into various forest types. Commercial plantations presented in this dataset were sourced from the National Plantation Inventory (NPI) spatial dataset (2021), also produced by ABARES.Another dataset produced by ABARES, the Catchment scale land use of Australia CLUM dataset (2020), was used to identify and mask out land uses that are inappropriate to map as forest.The Forests of Australia (2023) dataset is produced to fulfil requirements of Australia's National Forest Policy Statement and the Regional Forests Agreement Act 2002 (Cwth) and is used by the Australian Government for domestic and international reporting.Previous versions of this dataset are available on the Forests Australia website spatial data page and the Australian Government open government data portaldata.gov.au.CurrencyDate modified: 30 November 2023Modification frequency: Every 5 yearsData extentSpatial extentNorth: -8.2°South: -44.4°East: 157.2°West: 109.5°Source informationData, Metadata, Maps and Interactive views are available from ABARES website.Forests of Australia (2023) – Descriptive metadata.The data was obtained from Department of Agriculture, Fisheries and Forestry - Australian Bureau of Agricultural and Resource Economics and Sciences (ABARES). ABARES is providing this data to the public under a Creative Commons Attribution 4.0 license.Lineage statementPresented on this page is a summarised lineage on the development of state and territory datasets for Forests of Australia (2023). The dataset has been produced using the Multiple Lines of Evidence (MLE) method for publication in the Australia’s State of the Forests Report – 2023 update. Detailed lineage information can be found here.Forests of Australia (2023) is a continental spatial dataset of forest extent, by national forest categories and types, assembled for Australia's State of the Forests Report – 2023 update. It was developed from multiple forest, vegetation and land cover data inputs, including contributions from Australian, state and territory government agencies and external sources.For each state or territory, except for the ACT where there was no new data, intersection of the Forests of Australia (2018) dataset with a forest cover dataset supplied by the jurisdiction, and with other available and appropriate independent forest cover datasets, identified:High confidence areas – areas where all the examined datasets agreed with the Forests of Australia (2018) dataset that the areas were forest or non-forest. No further assessment was required for these areas.Moderate confidence areas – areas where the Forests of Australia (2018) dataset agreed with the forest cover dataset supplied by state or territory, and with external or independent datasets, that the areas were forest or non-forest. These areas were identified as potential errors and needed further analysis in order to determine the correct allocation (forest or non-forest). The required analyses and validation were conducted by ABARES, in consultation with relevant state and territory agencies, using various ancillary data including high-resolution imagery such as World Imagery by ESRI, Bing Maps and Google Earth Pro.Low confidence areas – areas where the Forests of Australia (2018) dataset disagreed with the forest cover dataset supplied by state or territory, and with external or independent datasets, that the areas were forest or non-forest. All such areas were identified as potential errors and needed further analysis in order to determine the correct allocation (forest or non-forest). The required analyses and validation were conducted by ABARES, in consultation with relevant state and territory agencies, using various ancillary data including high-resolution imagery such as World Imagery by ESRI, Bing Maps and Google Earth Pro.External or independent datasets used include:H_Woody_Fuzzy_2_Class dataset is based on the NGGI dataset produced by DCCEEW from Landsat data and was developed to support New South Wales Natural Resources Commission’s (NRC) Monitoring, Evaluation and Reporting Program. NRC applied Fuzzy Logic and Probability modelling to the NGGI dataset to derive annual layers distinguishing between forest and non-forest at 25 m raster resolution. Each of five annual layers, 2015 to 2019, was resampled to a 100 m raster by classifying as forest the 100 m pixels that had more than half their area as forest as determined from 25 m pixels. The five annual layers were combined and every pixel in the combination that had been classified as forest in any year during 2015-2019 period was allocated as forest (and the balance non-forest). This approach was taken to prevent areas where the crown cover had reduced temporarily below 20%, through events such as fire, harvesting, drought or disease, from being incorrectly classified as non-forest.State-wide Land and Tree Study (SLATS) dataset is based on data collected by the Landsat satellite. This dataset was available for Queensland only. Foliage Projective Cover (FPC) values of 11 or greater (equivalent to crown cover 20% or greater) were considered as forest candidates in this SLATS dataset. The National Vegetation Information System (NVIS) version 6.0 dataset was used to identify areas in this SLATS dataset that met the height requirements of the forest definition used by the National Forest Inventory.The National Greenhouse Gas Inventory (NGGI) dataset is produced from Landsat satellite Thematic Mapper™, Enhanced Thematic Mapper Plus (ETM+) and Operational Land Image (OLI) images for the Australian Government Department of the Climate Change, Energy, the Environment and Water (DCCEEW), and identifies woody vegetation of height or potential height greater than 2 metres, crown cover greater than 20%, and with a minimum patch size of 0.2 hectares (DISER, 2021a) . The dataset is compiled using time-series data since 1972 and is produced at a 25 m × 25 m resolution. The NGGI dataset used was developed from the five annual layers (2016-2020, inclusive) from the ‘National Forest and sparse woody vegetation data (Version 5.0) spatial dataset produced using the algorithms for land-use change allocation developed for the National Inventory Reports (DISER, 2021b). Each layer of the original 25 m resolution, three-class (forest, sparse woody and non-forest) dataset was resampled to a binary (forest and non-forest) 100 m raster by classifying as forest the 100 m pixels that had more than half their area as forest; the sparse woody and non-forest classes were combined into a non-forest class. The five annual layers were then combined and every pixel in the combination that had been classified as forest in any year during 2016-2020 period was allocated as forest (and the balance non-forest). This approach was taken to prevent areas where the crown cover had reduced temporarily below 20%, through events such as fire, harvesting, drought or disease, from being incorrectly classified as non-forest.All input datasets were converted to 100m rasters (ESRI GRID format), aligning with relevant standard NFI state or territory masks (also known as NFI SNAP grids), in Albers projection. Where the input dataset was in polygon format, the Polygon to Raster tool was used to convert the polygon dataset to raster format, using the Maximum_Combined_Area option.Validation assessment results were incorporated to give improved and high-confidence forest cover datasets for each state or territory.Look-up tables translating the state or territory forest cover data to NFI forest types were used where provided. Where this information was not provided, it was derived by ABARES from translating Levels 5 and 6 of the National Vegetation Information System (NVIS) version 6.0 attribute information to NFI forest types.This dataset has been converted from GeoTIFF to Multidimensional Cloud Raster Format (CRF) to facilitate publishing to the Digital Atlas of Australia (DAA).Date of extraction: February 2024.Data dictionaryAttribute nameDescriptionVALUEIdentifier of every unique combination of the following attributes: STATE, FOR_SOURCE, FOR_CODE, FOR_TYPE, FOR_CAT, HEIGHT and COVER.COUNTNumber of cells that belong to a particular VALUE. For this dataset, in which cell resolution is 100 by 100 metres.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
World elevation dataset
High resolution dataset containing the world elevation above the sea level in meters. See python example to get the estimated elevation from a coordinate.
Info
This dataset comprises global elevation data sourced from ASTER GDEM, which has been compressed and retiled for efficiency. The retiled data adheres to the common web map tile convention used by platforms such as OpenStreetMap, Google Maps, and Bing Maps, providing compatibility with zoom… See the full description on the dataset page: https://huggingface.co/datasets/Upabjojr/elevation-data-ASTER-compressed-retiled.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Autonomous Agent Agent Meta and Traffic Dataset in AI Agent Marketplace | AI Agent Directory | AI Agent Index from DeepNLP
This dataset is collected from AI Agent Marketplace Index and Directory at http://www.deepnlp.org, which contains AI Agents's meta information such as agent's name, website, description, as well as the monthly updated Web performance metrics, including Google,Bing average search ranking positions, Github Stars, Arxiv References, etc. The dataset is helpful for… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/autonomous-agent.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset was created by UNIFI-ICARDA aiming to list all the soil and water conservation and restoration interventions in the Badia region, in Jordan. All the interventions were georeferenced by using public domain remote sensing data: Google Earth, was the principal source of information, although the image coverage of GE for Badia is heterogeneous and some Badia areas are covered only by old images. In these areas, a cross evaluation was done using Bing Satellite Imagery, where it offered more recent images. The identified interventions are divided by categories: 58 dams, 203 tanks, 841 check dams, 51 contour structure interventions, 27 reforestation sites, 30 desert restorations, 115 spontaneous intervention. A dam is an artificial structure typically made of concrete or earth and built to inbound water, creating a permanent or semi-permanent basin along a main waterway to support agriculture in the surrounding area. A tank is an artificial water pond built by private or governmental initiative in order to collect and stock seasonal water flows for agricultural/animal husbandry purposes. Check dams are linear structures made of earth (sometimes rock, or concrete), built transversal to the stream flow in the ephemeral river bed, in order to to slow down the flow of water, to increase ground water recharge as well as for agricultural and anti-erosion purposes. Contour structures consist of linear micro-catchment interventions characterized by ridges and furrows traced along the contour lines, where drought tolerant shrubs species are planted in ridges to provide a source of fodder for livestock. They are divided in continuous and discontinuous type. Reforestation interventions consist of vast land works following the contour lines along the slopes, where vegetation was established. Desert restoration interventions includes three different type of interventions: proper desert restoration like dune fixation structures, afforestation interventions, and aquifer recharge structures. Spontaneous hydraulic interventions seem to be similar in purpose to check dams, although showing a more irregular structure and distribution pattern. These structures are often closely connected with other agricultural structures and probably linked to private initiative. The mapped dams and tanks were compared with two other datasets generated by the Jordan institutions: CAP database (list of Badia Restoration Program interventions) and the WAJ database (list of Jordan Valley Authority interventions), resulting in some correspondences, although the comparison was only partially possible because the available satellite images were in several cases too old, particularly in eastern Badia. In other cases, even in presence of recent images, the structures were not found (neither on GE nor on Bing), revealing possible errors of the coordinates indicated by the two datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset of the queries and chatbot responses for the paper The silence of the LLMs: Cross-lingual analysis of guardrail-related political bias and false information prevalence in ChatGPT, Google Bard (Gemini), and Bing Chat
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for NatQuest
NatQuest is a dataset of natural questions asked by online users on ChatGPT, Google, Bing and Quora. Repository: See the Repo for more details about the dataset.
Source Data
The sources of the data are the following:
MSMarco NaturalQuestions Quora Question Pairs ShareGPT WildChat
Recommendations
Be aware that the dataset might contain NSFW content. Also, as discussed in the paper, the filtering procedures of each source might… See the full description on the dataset page: https://huggingface.co/datasets/causal-nlp/NatQuest.
Dataset contains 613 news of CanalUGR (University of Granada Communication Office) tracked on the main online news aggregators (Google News, Yahoo! News and Bing News). We include: number in CanalUGR, media, country, type.