26 datasets found

o
Data from: News of CanalUGR tracked on Google News, Yahoo! News and Bing...
explore.openaire.eu
data.niaid.nih.gov
Updated Jul 3, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Víctor Herrero Solana; Luis Leonardo Arboledas Márquez; Elisa Legerén-Álvarez (2013). News of CanalUGR tracked on Google News, Yahoo! News and Bing News [Dataset]. http://doi.org/10.30827/digibug.31174
Explore at:
Unique identifier
https://doi.org/10.30827/digibug.31174
Dataset updated
Jul 3, 2013
Authors
Víctor Herrero Solana; Luis Leonardo Arboledas Márquez; Elisa Legerén-Álvarez
Description
Dataset contains 613 news of CanalUGR (University of Granada Communication Office) tracked on the main online news aggregators (Google News, Yahoo! News and Bing News). We include: number in CanalUGR, media, country, type.
e
A global snapshot of the spatial and temporal distribution of very high...
b2find.eudat.eu
Updated Oct 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). A global snapshot of the spatial and temporal distribution of very high resolution satellite imagery in Google Earth and Bing Maps as of 11th of January, 2017 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/017c0447-4356-5fec-8c32-da26aa4d9385
Explore at:
Dataset updated
Oct 20, 2023
Description
Very high resolution (VHR) satellite imagery from Google Earth and Microsoft Bing Maps is increasingly being used in a variety of applications from computer sciences to arts and humanities. In the field of remote sensing, one use of this imagery is to create reference data sets through visual interpretation, e.g., to complement existing training data or to aid in the validation of land-cover products. Through new applications such as Collect Earth, this imagery is also being used for monitoring purposes in the form of statistical surveys obtained through visual interpretation. However, little is known about where VHR satellite imagery exists globally or the dates of the imagery. Here we present a global overview of the spatial and temporal distribution of VHR satellite imagery in Google Earth and Microsoft Bing Maps. The results show an uneven availability globally, with biases in certain areas such as the USA, Europe and India, and with clear discontinuities at political borders. We also show that the availability of VHR imagery is currently not adequate for monitoring protected areas and deforestation, but is better suited for monitoring changes in cropland or urban areas using visual interpretation Note: (1) Information on growing and non-growing seasons has been derived from the remote sensing product: https://lpdaac.usgs.gov/dataset_discovery/measures/measures_products_table/vipphen_ndvi_v004(2) Google provides full global coverage by images, in contrast to Bing. However, in many areas, these are Landsat-based images (from 1984 up to now). For more objective comparison with Bing imagery, we have excluded those areas from the analysis. Supplement to: Lesiv, Myroslava; See, Linda; Laso-Bayas, Juan-Carlos; Sturn, Tobias; Schepaschenko, Dmitry; Karner, Mathias; Moorthy, Inian; McCallum, Ian; Fritz, Steffen (2018): Characterizing the Spatial and Temporal Availability of Very High Resolution Satellite Imagery in Google Earth and Microsoft Bing Maps as a Source of Reference Data. Land, 7(4), 118
Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...
zenodo.org
data.niaid.nih.gov
csv
Updated Mar 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fabian Haak; Fabian Haak; Philipp Schaer; Philipp Schaer (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. http://doi.org/10.5281/zenodo.7682915
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7682915
Dataset updated
Mar 1, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Fabian Haak; Fabian Haak; Philipp Schaer; Philipp Schaer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles.
Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

Dataset 2: Search Query Suggestions (suggestions.csv)

The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

AllSides Scraper

At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.
Z
Search Engines Comparison and Websites Performance
data.niaid.nih.gov
zenodo.org
Updated Jul 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ntararas, Vasilios (2023). Search Engines Comparison and Websites Performance [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8102699
Explore at:
Dataset updated
Jul 1, 2023
Dataset provided by
Ntimo, Georgios
Ntararas, Vasilios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The current dataset is consisted of 200 search results extracted from Google and Bing engines (100 of Google and 100 of Bing). The search terms are selected from the 10 most search keywords of 2021 based on the provided data of Google Trends. The rest of the sheets include the performance of the websites according to three technical evaluation aspects. That is, SEO, Speed and Security. The performance dataset has been developed through the utilization of CheckBot crawling tool. The whole dataset can help information retrieval scientists to compare the two engines in terms of their position/ranking and their performance related to these factors.

For more information about the thinking of the of the structure of the dataset please contact the Information Management Lab of University of West Attica.

Contact Persons: Vasilis Ntararas (lb17032@uniwa.gr) , Georgios Ntimo (lb17100@uniwa.gr) and Ioannis C. Drivas (idrivas@uniwa.gr)
n
California Electric Power Plants - Dataset - CKAN
nationaldataplatform.org
Updated Feb 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). California Electric Power Plants - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/california-electric-power-plants
Explore at:
Dataset updated
Feb 28, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
California
Description
This data is usually updated quarterly by February 1st, May 1st, August 1st, and November 1st.The CEC Power Plant geospatial data layer contains point features representing power generating facilities in California, and power plants with imported electricity from Nevada, Arizona, Utah and Mexico.The transmission line, substation and power plant mapping database were started in 1990 by the CEC GIS staffs. The final project was completed in October 2010. The enterprise GIS system on CEC's critical infrastructure database was leaded by GIS Unit in November 2014 and was implemented in May 2016. The data was derived from CEC's Quarterly Fuel and Energy Report (QFER), Energy Facility Licensing (Siting), Wind Performance Reporting System (WPRS), and Renewable Energy Action Team (REAT). The sources for the power plant point digitizing are including sub-meter resolution of Digital Globe, Bing, Google, ESRI and NAIP aerial imageries, with scale at least 1:10,000. Occasionally, USGS Topographic map, Google Street View and Bing Bird's Eye are used to verify the precise location of a facility.Although a power plant may have multiple generators, or units, the power plant layer represents all units at a plant as one feature. Detailed attribute information associated with the power plant layer includes CEC Plant ID, Plant Label, Plant Capacity (MW), General Fuel, Plant Status, CEC Project Status, CEC Docket ID, REAT ID, Plant County, Plant State, Renewable Energy, Wind Resource Area, Local Reliability Area, Sub Area, Electric Service Area, Service Area Category, California Balancing Authorities, California Air District, California Air Basin, Quad Name, Senate District, Assembly District, Congressional District, Power Project Web Link, CEC Link, Aerial, QRERGEN Comment, WPRS Comment, Geoscience Comment, Carto Comment, QFERGEN Excel Link, WPRS Excel Link, Schedule 3 Excel Link, and CEC Data Source. For power plant layer which is joined with QFer database, additional fields are displayed: CEC Plant Name (full name), Plant Alias, EIA Plant ID, Plant City, Initial Start Date, Online Year, Retire Date, Generator or Turbine Count, RPS Eligible, RPS Number, Operator Company Name, and Prime Mover ID. In general, utility and non-utility operated power plant spatial data with at least 1 MW of demonstrated capacity and operating status are distributed. Special request is required on power plant spatial data with all capacities and all stages of status, including Cold Standby, Indefinite Shutdown, Maintenance, Non-Operational, Proposed, Retired, Standby, Terminated, and Unknown.For question on power generation or others, please contact Michael Nyberg at (916) 654-5968.California Energy Commission's Open Data Portal.
Data from: Inventory of online public databases and repositories holding...
catalog.data.gov
s.cnmilf.com
+4more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
F
News400 Dataset
data.uni-hannover.de
tar
Updated Jan 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIB (2022). News400 Dataset [Dataset]. https://data.uni-hannover.de/dataset/news400
Explore at:
tar(7478569), tar(471094303), tar(43316706)Available download formats
Dataset updated
Jan 20, 2022
Dataset authored and provided by
TIB
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

This repository contains the News400 dataset introduced in the paper:

Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, and Ralph Ewerth. 2020. Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency. In Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR '20). Association for Computing Machinery, New York, NY, USA, 16–25. DOI: https://doi.org/10.1145/3372278.3390670

Content

news400.tar.gz:

dataset.jsonl containing:

Web links to the news texts

Web links to the news image

Outputs of the named entity recognition and disambiguation (NERD) approach

Untampered and tampered entities

<entity>.jsonl file for each entity type containing the following information for each entity:

Wikidata ID

Wikidata label

Meta information used for tampering

Web links to all reference images crawled from Google, Bing, and Wikidata

splits for testing and validation

news400_features.tar.gz:

Visual features of the news images for persons, locations, and scenes

Visual features of the reference images for persons, locations, and scenes

news400_wordembeddings.tar.gz: Word embeddings of all nouns in the news texts

Source Code

The source code to reproduce our results as well as download scripts to crawl news texts and images can be found on our GitHub page: https://github.com/TIBHannover/cross-modal_entity_consistency
F
TamperedNews Dataset
data.uni-hannover.de
partaa, partab +15
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIB (2022). TamperedNews Dataset [Dataset]. https://data.uni-hannover.de/dataset/tamperednews
Explore at:
partaa(524288000), partac(524288000), partab(524288000), tar(74703162), partaj(524288000), partal(524288000), partap(189537392), partan(524288000), partam(524288000), partah(524288000), partae(524288000), partai(524288000), partad(524288000), partag(524288000), partaf(485776675), partao(524288000), partak(524288000), partaf(524288000)Available download formats
Dataset updated
Jan 20, 2022
Dataset authored and provided by
TIB
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

This repository contains the TamperedNews dataset introduced in the paper:

Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, and Ralph Ewerth. 2020. Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency. In Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR '20). Association for Computing Machinery, New York, NY, USA, 16–25. DOI: https://doi.org/10.1145/3372278.3390670

Content

tamperednews.tar.gz:

dataset.jsonl containing:

Web links to the news texts

Web links to the news image

Outputs of the named entity recognition and disambiguation (NERD) approach

Untampered and tampered entities

entity_type.jsonl file for each entity type containing the following information for each entity:

Wikidata ID

Wikidata label

Meta information used for tampering

Web links to all reference images crawled from Google, Bing, and Wikidata

splits for testing and validation

tamperednews_features.tar.gz:

Visual features of the news images for persons, locations, and scenes

Visual features of the reference images for persons, locations, and scenes

tamperednews_wordembeddings.tar.gz: Word embeddings of all nouns in the news texts

Source Code

The source code to reproduce our results as well as download scripts to crawl news texts and images can be found on our GitHub page: https://github.com/TIBHannover/cross-modal_entity_consistency
Milan (ITALY) - Urban Agriculture spatial dataset (years 2007 and 2014)
zenodo.org
data.niaid.nih.gov
zip
Updated Dec 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pulighe Giuseppe; Pulighe Giuseppe; Lupia Flavio; Lupia Flavio (2021). Milan (ITALY) - Urban Agriculture spatial dataset (years 2007 and 2014) [Dataset]. http://doi.org/10.5281/zenodo.5773844
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5773844
Dataset updated
Dec 11, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Pulighe Giuseppe; Pulighe Giuseppe; Lupia Flavio; Lupia Flavio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Italy, Milan
Description
The data in this dataset is a spatial inventory of urban agriculture (UA) carried out in the city of Milan (Italy). UA areas where identified with a multi-step and iterative procedure by using different web-mapping tools, especially multitemporal Google Earth images, and ancillary data such as Google Street View and Bing Maps.

License

Creative Commons CC-BY

Disclaimer

Despite our best efforts to validate the data, some information may be incorrect.

Description of the dataset

Typologies of UA

Residential garden: Private parcel near single houses (e.g. backyard), villas, buildings, industrial and commercial activities, generally managed by property owners. Cultivation is diversified ranging from leafy vegetables to herbs and fruit trees. Production is intended for self-consumption and/or for hobby purposes.

Community garden: A large area subdivided into multipleplots managed individually (i.e. allotment) or collectively by a group of people. Crop production is intended for self-consumption. Land is assigned by the Municipality; several cases of land cultivated without authorization are also common.

Urban farm: Parcel managed by professional farmers with an intensive and an advanced cropping system. The cultivation can be specialized or oriented to high diversity vegetables. The production is intended for market. The mapping procedure focus on arable crops, horticulture, vineyard, olive groves and orchard.

Institutional garden: Parcel managed by institutions or organizations like schools, religious center, prisons and non-profit organizations. The production is generally intended for self-consumption and less frequently for trade. Several gardens in this category are intended for social purposes (e.g. recreation,education, etc.).

Illegal garden: Parcel isolated, cultivated without authorization organized and managed individually or by a few people. Localization occurs on unused or abandoned areas owned by public bodies or private subjects. The production is intended for self-consumption.

Nurseries: A large area subdivided into multiple plots managed for growing ornamental plants and flowers.

Land use typologies

Horticulture: annual crops generally seed sown in spring or summer (tomatoes, lettuce, zucchini, cucumbers, peppers).

Vineyard: grape vines grown in order to produce wine or table grape.

Olive groves: olive trees grown in order to produce olive oil or table olives.

Orchards: mixed trees such as orange, stone fruit, pome fruit, olive trees.

Mixed crops: an area grown with a mix of horticulture crops and fruit trees, not divisible.

Nurseries: ornamental plants, trees, flowers.

Credit

Pulighe G., Lupia F. (2019) Multitemporal Geospatial Evaluation of Urban Agriculture and (Non)-Sustainable Food Self-Provisioning in Milan, Italy. Sustainability 2019, 11(7), 1846

https://www.mdpi.com/2071-1050/11/7/1846
Data from: Reliance on Science in Patenting
zenodo.org
explore.openaire.eu
pdf, tsv, zip
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matt Marx; Matt Marx; Aaron Fuegi; Aaron Fuegi (2024). Reliance on Science in Patenting [Dataset]. http://doi.org/10.5281/zenodo.3382981
Explore at:
pdf, zip, tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3382981
Dataset updated
Jul 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matt Marx; Matt Marx; Aaron Fuegi; Aaron Fuegi
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
This dataset contains citations from USPTO patents granted 1947-2018 to articles captured by the Microsoft Academic Graph (MAG) from 1800-2018. If you use the data, please cite these two papers:

for the dataset of citations: Marx, Matt and Aaron Fuegi, "Reliance on Science in Patenting: USPTO Front-Page Citations to Scientific Articles" (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3331686)

for the underlying dataset of papers Sinha, Arnab, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion). ACM, New York, NY, USA, 243-246.

The main file, pcs.tsv, contains the resolved citations. Fields are tab-separated. Each match has the patent number, MAG ID, the original citation from the patent, an indicator for whether the citation was supplied by the applicant, examiner, or unknown, and a confidence score (1-10) indicating how likely this match is correct. Note that this distribution does not contain matches with confidence 2 or 1.

There is also a PubMed-specific match in pcs-pubmed.tsv.

The remaining files are a redistribution of the 1 January 2019 release of the Microsoft Academic Graph. All of these files are compressed using ZIP compression under CentOS5. Original files, documented at https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema, can be downloaded from https://aka.ms/msracad; this redistribution carves up the original files into smaller, variable-specific files that can be loaded individually (see _relianceonscience.pdf for full details).

Source code for generating the patent citations to science in pcs.tsv is available at https://github.com/mattmarx/reliance_on_science. Source code for generating jif.zip and jcif.zip (Journal Impact Factor and Journal Commercial Impact Factor) is at https://github.com/mattmarx/jcif.

Although MAG contains authors and affiliations for each paper, it does not contain the location for affiliations. We have created a dataset of locations for affiliations appearing at least 100x using Bing Maps and Google Maps; however, it is unclear to us whether the API licensing terms allow us to repost their data. In any case, you can download our source code for doing so here: https://github.com/ksjiaxian/api-requester-locations.

MAG extracts field keywords for each paper (paperfieldid.zip and fieldidname.zip) --more than 200,000 fields in all! When looking to study industries or technical areas you might find this a bit overwhelming. We mapped the MAG subjects to six OECD fields and 39 subfields, defined here: http://www.oecd.org/science/inno/38235147.pdf. Clarivate provides a crosswalk between the OECD classifications and Web of Science fields, so we include WoS fields as well. This file is magfield_oecd_wos_crosswalk.zip.
h
data-analysis-ai-agent
huggingface.co
Updated Apr 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepNLP (2025). data-analysis-ai-agent [Dataset]. https://huggingface.co/datasets/DeepNLP/data-analysis-ai-agent
Explore at:
Dataset updated
Apr 3, 2025
Authors
DeepNLP
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Data Analysis Agent Meta and Traffic Dataset in AI Agent Marketplace | AI Agent Directory | AI Agent Index from DeepNLP

This dataset is collected from AI Agent Marketplace Index and Directory at http://www.deepnlp.org, which contains AI Agents's meta information such as agent's name, website, description, as well as the monthly updated Web performance metrics, including Google,Bing average search ranking positions, Github Stars, Arxiv References, etc. The dataset is helpful for AI… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/data-analysis-ai-agent.
W
Australian National Data Service
cloud.csiss.gmu.edu
gimi9.com
+2more
html, xml
Updated Dec 13, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australia (2019). Australian National Data Service [Dataset]. https://cloud.csiss.gmu.edu/uddi/uk_UA/dataset/australian-national-data-service
Explore at:
html, xmlAvailable download formats
Dataset updated
Dec 13, 2019
Dataset provided by
Australia
Area covered
Australia
Description
Research Data Australia is an Internet-based collection designed to promote visibility of Australian research data in search engines such as Google and Bing. Research Data Australia aims to provide a comprehensive window into the Australian Research Data Commons. It provides connections between data, projects, researchers and services across organisations and discipline

Research is producing larger and more complex data than ever before. It is imperative that these data outputs are effectively managed and shared. Better data – better described, more connected, more integrated and organised, more accessible, more easily used for new purposes – allows new questions to be investigated, larger issues to be investigated, and data landscapes to be explored.
C
Dataset visualization service: Land Use sc. 1:10000 - Ed. 2015
ckan.mobidatalab.eu
wms
Updated Apr 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GeoDatiGovIt RNDT (2023). Dataset visualization service: Land Use sc. 1:10000 - Ed. 2015 [Dataset]. https://ckan.mobidatalab.eu/dataset/land-use-dataset-visualization-service-sc-1-10000-ed-2015
Explore at:
wmsAvailable download formats
Dataset updated
Apr 28, 2023
Dataset provided by
GeoDatiGovIt RNDT
Description
The 2015 edition of the Land Use keeps the classification unchanged and all the classes of the level have been updated. Updated polygons show the date of the modification. The natural color aerial images of 2013 and infrared of 2010, provided by AGEA and the high resolution images of Google Earth and Bing Maps 2015 were analysed. The Land Use update sc. 1:10000 - Ed. 2018. - Coverage - Entire Regional Territory - Origin - Photoanalysis and photointerpretation of Agea 2013 digital aerial images and high resolution Google Earth and Bing Maps 2015 satellite images.
F
TamperedNews & News400 (IJMIR'21 Update)
data.uni-hannover.de
partaa, partab +5
Updated May 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIB (2022). TamperedNews & News400 (IJMIR'21 Update) [Dataset]. https://data.uni-hannover.de/dataset/tamperednews-news400-ijmir21
Explore at:
tar.gz(36324241), tar.gz(43558405), tar.gz(304532), partac(500000000), partab(500000000), partad(500000000), partaa(500000000), tar(43282945), partad(370561409), tar(10547367), partae(445522586)Available download formats
Dataset updated
May 17, 2022
Dataset authored and provided by
TIB
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

This repository contains the TamperedNews and News400 datasets introduced in the paper:

Eric Müller-Budack, Jonas Theiner, Sebastian Diering, Maximilian Idahl, Sherzod Hakimov und Ralph Ewerth. „Multimodal news analytics using measures of cross-modal entity and context consistency“. In: International Journal of Multimedia Information Retrieval 10.2 (2021), Springer, S. 111–125. DOI: https://doi.org/10.1007/s13735-021-00207-4

Content

For both datasets TamperedNews and News400, we provide the:

*dataset*.tar.gz containing the *dataset*.jsonl with

Web links to the news texts

Web links to the news image

Outputs of the named entity recognition and disambiguation (NERD) approach

Untampered and tampered entities

*dataset*_features.tar.gzwith visual features for events, locations, and persons

news400_wordembeddings.tar.gz: Word embeddings of all nouns in the news texts of the News400 dataset

Please note that the word embeddings of the TamperedNews dataset (tamperednews_wordembeddings.tar.gz) have been already provided in the first version (Link).

For all entities detected in both datasets, we provide:

entities.tar.gz containing an *entity_type*.jsonl for all entity types (events, locations, and persons) with:

Wikidata ID

Wikidata label

Meta information used for tampering

Web links to all reference images crawled from Google, Bing, and Wikidata

entities_features.tar.gz containing the visual features of the reference images for all entities

Source Code

The source code to reproduce our results as well as download scripts to crawl news texts and images can be found on our GitHub page: https://github.com/TIBHannover/cross-modal_entity_consistency
a
Forests of Australia (2023)
digital.atlas.gov.au
Updated Sep 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Digital Atlas of Australia (2024). Forests of Australia (2023) [Dataset]. https://digital.atlas.gov.au/datasets/forests-of-australia-2023/about
Explore at:
Dataset updated
Sep 4, 2024
Dataset authored and provided by
Digital Atlas of Australia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
AbstractForests of Australia (2023) is a continental spatial dataset of forest extent, by national forest categories and types, assembled for Australia's State of the Forests Report. It was developed from multiple forest, vegetation and land cover data inputs, including contributions from Australian, state and territory government agencies and external sources.A forest is defined in this dataset as "An area, incorporating all living and non-living components, that is dominated by trees having usually a single stem and a mature or potentially mature stand height exceeding two metres and with existing or potential crown cover of overstorey strata about equal to or greater than 20 per cent. This includes Australia's diverse native forests and plantations, regardless of age. It is also sufficiently broad to encompass areas of trees that are sometimes described as woodlands".The dataset was compiled by the Australian Bureau of Agricultural and Resource Economics and Sciences (ABARES) for the National Forest Inventory (NFI), a collaborative partnership between the Australian and state and territory governments. The role of the NFI is to collate, integrate and communicate information on Australia's forests. State and territory government agencies collect forest data using independent methods and at varying scales or resolutions. The NFI applies a national classification to state and territory data to allow seamless integration of these datasets. Multiple independent sources of external data are used to fill data gaps and improve the quality of the final dataset.The NFI classifies forests into three national forest categories (Native Forest, Commercial plantation, and other forest) and then into various forest types. Commercial plantations presented in this dataset were sourced from the National Plantation Inventory (NPI) spatial dataset (2021), also produced by ABARES.Another dataset produced by ABARES, the Catchment scale land use of Australia CLUM dataset (2020), was used to identify and mask out land uses that are inappropriate to map as forest.The Forests of Australia (2023) dataset is produced to fulfil requirements of Australia's National Forest Policy Statement and the Regional Forests Agreement Act 2002 (Cwth) and is used by the Australian Government for domestic and international reporting.Previous versions of this dataset are available on the Forests Australia website spatial data page and the Australian Government open government data portaldata.gov.au.CurrencyDate modified: 30 November 2023Modification frequency: Every 5 yearsData extentSpatial extentNorth: -8.2°South: -44.4°East: 157.2°West: 109.5°Source informationData, Metadata, Maps and Interactive views are available from ABARES website.Forests of Australia (2023) – Descriptive metadata.The data was obtained from Department of Agriculture, Fisheries and Forestry - Australian Bureau of Agricultural and Resource Economics and Sciences (ABARES). ABARES is providing this data to the public under a Creative Commons Attribution 4.0 license.Lineage statementPresented on this page is a summarised lineage on the development of state and territory datasets for Forests of Australia (2023). The dataset has been produced using the Multiple Lines of Evidence (MLE) method for publication in the Australia’s State of the Forests Report – 2023 update. Detailed lineage information can be found here.Forests of Australia (2023) is a continental spatial dataset of forest extent, by national forest categories and types, assembled for Australia's State of the Forests Report – 2023 update. It was developed from multiple forest, vegetation and land cover data inputs, including contributions from Australian, state and territory government agencies and external sources.For each state or territory, except for the ACT where there was no new data, intersection of the Forests of Australia (2018) dataset with a forest cover dataset supplied by the jurisdiction, and with other available and appropriate independent forest cover datasets, identified:High confidence areas – areas where all the examined datasets agreed with the Forests of Australia (2018) dataset that the areas were forest or non-forest. No further assessment was required for these areas.Moderate confidence areas – areas where the Forests of Australia (2018) dataset agreed with the forest cover dataset supplied by state or territory, and with external or independent datasets, that the areas were forest or non-forest. These areas were identified as potential errors and needed further analysis in order to determine the correct allocation (forest or non-forest). The required analyses and validation were conducted by ABARES, in consultation with relevant state and territory agencies, using various ancillary data including high-resolution imagery such as World Imagery by ESRI, Bing Maps and Google Earth Pro.Low confidence areas – areas where the Forests of Australia (2018) dataset disagreed with the forest cover dataset supplied by state or territory, and with external or independent datasets, that the areas were forest or non-forest. All such areas were identified as potential errors and needed further analysis in order to determine the correct allocation (forest or non-forest). The required analyses and validation were conducted by ABARES, in consultation with relevant state and territory agencies, using various ancillary data including high-resolution imagery such as World Imagery by ESRI, Bing Maps and Google Earth Pro.External or independent datasets used include:H_Woody_Fuzzy_2_Class dataset is based on the NGGI dataset produced by DCCEEW from Landsat data and was developed to support New South Wales Natural Resources Commission’s (NRC) Monitoring, Evaluation and Reporting Program. NRC applied Fuzzy Logic and Probability modelling to the NGGI dataset to derive annual layers distinguishing between forest and non-forest at 25 m raster resolution. Each of five annual layers, 2015 to 2019, was resampled to a 100 m raster by classifying as forest the 100 m pixels that had more than half their area as forest as determined from 25 m pixels. The five annual layers were combined and every pixel in the combination that had been classified as forest in any year during 2015-2019 period was allocated as forest (and the balance non-forest). This approach was taken to prevent areas where the crown cover had reduced temporarily below 20%, through events such as fire, harvesting, drought or disease, from being incorrectly classified as non-forest.State-wide Land and Tree Study (SLATS) dataset is based on data collected by the Landsat satellite. This dataset was available for Queensland only. Foliage Projective Cover (FPC) values of 11 or greater (equivalent to crown cover 20% or greater) were considered as forest candidates in this SLATS dataset. The National Vegetation Information System (NVIS) version 6.0 dataset was used to identify areas in this SLATS dataset that met the height requirements of the forest definition used by the National Forest Inventory.The National Greenhouse Gas Inventory (NGGI) dataset is produced from Landsat satellite Thematic Mapper™, Enhanced Thematic Mapper Plus (ETM+) and Operational Land Image (OLI) images for the Australian Government Department of the Climate Change, Energy, the Environment and Water (DCCEEW), and identifies woody vegetation of height or potential height greater than 2 metres, crown cover greater than 20%, and with a minimum patch size of 0.2 hectares (DISER, 2021a) . The dataset is compiled using time-series data since 1972 and is produced at a 25 m × 25 m resolution. The NGGI dataset used was developed from the five annual layers (2016-2020, inclusive) from the ‘National Forest and sparse woody vegetation data (Version 5.0) spatial dataset produced using the algorithms for land-use change allocation developed for the National Inventory Reports (DISER, 2021b). Each layer of the original 25 m resolution, three-class (forest, sparse woody and non-forest) dataset was resampled to a binary (forest and non-forest) 100 m raster by classifying as forest the 100 m pixels that had more than half their area as forest; the sparse woody and non-forest classes were combined into a non-forest class. The five annual layers were then combined and every pixel in the combination that had been classified as forest in any year during 2016-2020 period was allocated as forest (and the balance non-forest). This approach was taken to prevent areas where the crown cover had reduced temporarily below 20%, through events such as fire, harvesting, drought or disease, from being incorrectly classified as non-forest.All input datasets were converted to 100m rasters (ESRI GRID format), aligning with relevant standard NFI state or territory masks (also known as NFI SNAP grids), in Albers projection. Where the input dataset was in polygon format, the Polygon to Raster tool was used to convert the polygon dataset to raster format, using the Maximum_Combined_Area option.Validation assessment results were incorporated to give improved and high-confidence forest cover datasets for each state or territory.Look-up tables translating the state or territory forest cover data to NFI forest types were used where provided. Where this information was not provided, it was derived by ABARES from translating Levels 5 and 6 of the National Vegetation Information System (NVIS) version 6.0 attribute information to NFI forest types.This dataset has been converted from GeoTIFF to Multidimensional Cloud Raster Format (CRF) to facilitate publishing to the Digital Atlas of Australia (DAA).Date of extraction: February 2024.Data dictionaryAttribute nameDescriptionVALUEIdentifier of every unique combination of the following attributes: STATE, FOR_SOURCE, FOR_CODE, FOR_TYPE, FOR_CAT, HEIGHT and COVER.COUNTNumber of cells that belong to a particular VALUE. For this dataset, in which cell resolution is 100 by 100 metres.
h
elevation-data-ASTER-compressed-retiled
huggingface.co
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesco (2024). elevation-data-ASTER-compressed-retiled [Dataset]. https://huggingface.co/datasets/Upabjojr/elevation-data-ASTER-compressed-retiled
Explore at:
Dataset updated
Aug 13, 2024
Authors
Francesco
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
World elevation dataset

High resolution dataset containing the world elevation above the sea level in meters. See python example to get the estimated elevation from a coordinate.

Info

This dataset comprises global elevation data sourced from ASTER GDEM, which has been compressed and retiled for efficiency. The retiled data adheres to the common web map tile convention used by platforms such as OpenStreetMap, Google Maps, and Bing Maps, providing compatibility with zoom… See the full description on the dataset page: https://huggingface.co/datasets/Upabjojr/elevation-data-ASTER-compressed-retiled.
h
Data from: autonomous-agent
huggingface.co
Updated Dec 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepNLP (2024). autonomous-agent [Dataset]. https://huggingface.co/datasets/DeepNLP/autonomous-agent
Explore at:
Dataset updated
Dec 19, 2024
Authors
DeepNLP
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Autonomous Agent Agent Meta and Traffic Dataset in AI Agent Marketplace | AI Agent Directory | AI Agent Index from DeepNLP

This dataset is collected from AI Agent Marketplace Index and Directory at http://www.deepnlp.org, which contains AI Agents's meta information such as agent's name, website, description, as well as the monthly updated Web performance metrics, including Google,Bing average search ranking positions, Github Stars, Arxiv References, etc. The dataset is helpful for… See the full description on the dataset page: https://huggingface.co/datasets/DeepNLP/autonomous-agent.
M
Data from: Datasets on soil and water conservation and restoration...
data.mel.cgiar.org
csv
Updated Jun 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pietro Bartolini; Pietro Bartolini; Claudio Zucca; Claudio Zucca (2025). Datasets on soil and water conservation and restoration interventions in Jordan Badia - SWC database Jordan 2017 [Dataset]. https://data.mel.cgiar.org/dataset.xhtml?persistentId=hdl:20.500.11766.1/P3WRRC
Explore at:
csv(55642), csv(1581), csv(3712), csv(1716), csv(13216), csv(8817), csv(3868), csv(3290), csv(1610), csv(6536)Available download formats
Dataset updated
Jun 6, 2025
Dataset provided by
MELDATA
Authors
Pietro Bartolini; Pietro Bartolini; Claudio Zucca; Claudio Zucca
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Jordan
Dataset funded by
CGIARhttp://cgiar.org/
Description
The dataset was created by UNIFI-ICARDA aiming to list all the soil and water conservation and restoration interventions in the Badia region, in Jordan. All the interventions were georeferenced by using public domain remote sensing data: Google Earth, was the principal source of information, although the image coverage of GE for Badia is heterogeneous and some Badia areas are covered only by old images. In these areas, a cross evaluation was done using Bing Satellite Imagery, where it offered more recent images. The identified interventions are divided by categories: 58 dams, 203 tanks, 841 check dams, 51 contour structure interventions, 27 reforestation sites, 30 desert restorations, 115 spontaneous intervention. A dam is an artificial structure typically made of concrete or earth and built to inbound water, creating a permanent or semi-permanent basin along a main waterway to support agriculture in the surrounding area. A tank is an artificial water pond built by private or governmental initiative in order to collect and stock seasonal water flows for agricultural/animal husbandry purposes. Check dams are linear structures made of earth (sometimes rock, or concrete), built transversal to the stream flow in the ephemeral river bed, in order to to slow down the flow of water, to increase ground water recharge as well as for agricultural and anti-erosion purposes. Contour structures consist of linear micro-catchment interventions characterized by ridges and furrows traced along the contour lines, where drought tolerant shrubs species are planted in ridges to provide a source of fodder for livestock. They are divided in continuous and discontinuous type. Reforestation interventions consist of vast land works following the contour lines along the slopes, where vegetation was established. Desert restoration interventions includes three different type of interventions: proper desert restoration like dune fixation structures, afforestation interventions, and aquifer recharge structures. Spontaneous hydraulic interventions seem to be similar in purpose to check dams, although showing a more irregular structure and distribution pattern. These structures are often closely connected with other agricultural structures and probably linked to private initiative. The mapped dams and tanks were compared with two other datasets generated by the Jordan institutions: CAP database (list of Badia Restoration Program interventions) and the WAJ database (list of Jordan Valley Authority interventions), resulting in some correspondences, although the comparison was only partially possible because the available satellite images were in several cases too old, particularly in eastern Badia. In other cases, even in presence of recent images, the structures were not found (neither on GE nor on Bing), revealing possible errors of the coordinates indicated by the two datasets.
Data accompanying the manuscript The silence of the LLMs: Cross-lingual...
figshare.com
txt
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aleksandra Urman; Mykola Makhortykh (2024). Data accompanying the manuscript The silence of the LLMs: Cross-lingual analysis of guardrail-related political bias and false information prevalence in ChatGPT, Google Bard (Gemini), and Bing Chat [Dataset]. http://doi.org/10.6084/m9.figshare.24088407.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24088407.v1
Dataset updated
Nov 25, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Aleksandra Urman; Mykola Makhortykh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset of the queries and chatbot responses for the paper The silence of the LLMs: Cross-lingual analysis of guardrail-related political bias and false information prevalence in ChatGPT, Google Bard (Gemini), and Bing Chat
h
NatQuest
huggingface.co
Updated Oct 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Causal NLP (2024). NatQuest [Dataset]. https://huggingface.co/datasets/causal-nlp/NatQuest
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 28, 2024
Authors
Causal NLP
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for NatQuest

NatQuest is a dataset of natural questions asked by online users on ChatGPT, Google, Bing and Quora. Repository: See the Repo for more details about the dataset.

Source Data

The sources of the data are the following:

MSMarco NaturalQuestions Quora Question Pairs ShareGPT WildChat

Recommendations

Be aware that the dataset might contain NSFW content. Also, as discussed in the paper, the filtering procedures of each source might… See the full description on the dataset page: https://huggingface.co/datasets/causal-nlp/NatQuest.

Facebook

Twitter

Click to copy link

Link copied

Cite

Víctor Herrero Solana; Luis Leonardo Arboledas Márquez; Elisa Legerén-Álvarez (2013). News of CanalUGR tracked on Google News, Yahoo! News and Bing News [Dataset]. http://doi.org/10.30827/digibug.31174

Data from: News of CanalUGR tracked on Google News, Yahoo! News and Bing News

Explore at:

23 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.30827/digibug.31174

Dataset updated

Jul 3, 2013

Authors

Víctor Herrero Solana; Luis Leonardo Arboledas Márquez; Elisa Legerén-Álvarez

Description

Dataset contains 613 news of CanalUGR (University of Granada Communication Office) tracked on the main online news aggregators (Google News, Yahoo! News and Bing News). We include: number in CanalUGR, media, country, type.

Clear search

Close search

Google apps

Main menu

Data from: News of CanalUGR tracked on Google News, Yahoo! News and Bing...

A global snapshot of the spatial and temporal distribution of very high...

Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

Search Engines Comparison and Websites Performance

California Electric Power Plants - Dataset - CKAN

Data from: Inventory of online public databases and repositories holding...

News400 Dataset

Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

Content

Source Code

TamperedNews Dataset

Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

Content

Source Code

Milan (ITALY) - Urban Agriculture spatial dataset (years 2007 and 2014)

Data from: Reliance on Science in Patenting

data-analysis-ai-agent

Australian National Data Service

Dataset visualization service: Land Use sc. 1:10000 - Ed. 2015

TamperedNews & News400 (IJMIR'21 Update)

Multimodal Analytics for Real-world News using Measures of Cross-modal Entity Consistency

Content

Source Code

Forests of Australia (2023)

elevation-data-ASTER-compressed-retiled

Data from: autonomous-agent

Data from: Datasets on soil and water conservation and restoration...

Data accompanying the manuscript The silence of the LLMs: Cross-lingual...

NatQuest

Data from: News of CanalUGR tracked on Google News, Yahoo! News and Bing News