25 datasets found
  1. DataForSEO Google Full (Keywords+SERP) database, historical data available

    • datarade.ai
    .json, .csv
    Updated Aug 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataForSEO (2023). DataForSEO Google Full (Keywords+SERP) database, historical data available [Dataset]. https://datarade.ai/data-products/dataforseo-google-full-keywords-serp-database-historical-d-dataforseo
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Aug 17, 2023
    Dataset provided by
    Authors
    DataForSEO
    Area covered
    Sweden, Bolivia (Plurinational State of), Burkina Faso, Portugal, Paraguay, South Africa, Costa Rica, United Kingdom, CĂŽte d'Ivoire, Cyprus
    Description

    You can check the fields description in the documentation: current Full database: https://docs.dataforseo.com/v3/databases/google/full/?bash; Historical Full database: https://docs.dataforseo.com/v3/databases/google/history/full/?bash.

    Full Google Database is a combination of the Advanced Google SERP Database and Google Keyword Database.

    Google SERP Database offers millions of SERPs collected in 67 regions with most of Google’s advanced SERP features, including featured snippets, knowledge graphs, people also ask sections, top stories, and more.

    Google Keyword Database encompasses billions of search terms enriched with related Google Ads data: search volume trends, CPC, competition, and more.

    This database is available in JSON format only.

    You don’t have to download fresh data dumps in JSON – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.

  2. i

    Evolution of Web search engine interfaces through SERP screenshots and HTML...

    • rdm.inesctec.pt
    Updated Jul 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Evolution of Web search engine interfaces through SERP screenshots and HTML complete pages for 20 years - Dataset - CKAN [Dataset]. https://rdm.inesctec.pt/dataset/cs-2021-003
    Explore at:
    Dataset updated
    Jul 26, 2021
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset was extracted for a study on the evolution of Web search engine interfaces since their appearance. The well-known list of “10 blue links” has evolved into richer interfaces, often personalized to the search query, the user, and other aspects. We used the most searched queries by year to extract a representative sample of SERP from the Internet Archive. The Internet Archive has been keeping snapshots and the respective HTML version of webpages over time and tts collection contains more than 50 billion webpages. We used Python and Selenium Webdriver, for browser automation, to visit each capture online, check if the capture is valid, save the HTML version, and generate a full screenshot. The dataset contains all the extracted captures. Each capture is represented by a screenshot, an HTML file, and a files' folder. We concatenate the initial of the search engine (G) with the capture's timestamp for file naming. The filename ends with a sequential integer "-N" if the timestamp is repeated. For example, "G20070330145203-1" identifies a second capture from Google by March 30, 2007. The first is identified by "G20070330145203". Using this dataset, we analyzed how SERP evolved in terms of content, layout, design (e.g., color scheme, text styling, graphics), navigation, and file size. We have registered the appearance of SERP features and analyzed the design patterns involved in each SERP component. We found that the number of elements in SERP has been rising over the years, demanding a more extensive interface area and larger files. This systematic analysis portrays evolution trends in search engine user interfaces and, more generally, web design. We expect this work will trigger other, more specific studies that can take advantage of the dataset we provide here. This graphic represents the diversity of captures by year and search engine (Google and Bing).

  3. Google Trends and Wikipedia Page Views

    • zenodo.org
    • explore.openaire.eu
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mitsuo Yoshida; Mitsuo Yoshida (2020). Google Trends and Wikipedia Page Views [Dataset]. http://doi.org/10.5281/zenodo.14539
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mitsuo Yoshida; Mitsuo Yoshida
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Abstract (our paper)

    The frequency of a web search keyword generally reflects the degree of public interest in a particular subject matter. Search logs are therefore useful resources for trend analysis. However, access to search logs is typically restricted to search engine providers. In this paper, we investigate whether search frequency can be estimated from a different resource such as Wikipedia page views of open data. We found frequently searched keywords to have remarkably high correlations with Wikipedia page views. This suggests that Wikipedia page views can be an effective tool for determining popular global web search trends.

    Data

    personal-name.txt.gz:
    The first column is the Wikipedia article id, the second column is the search keyword, the third column is the Wikipedia article title, and the fourth column is the total of page views from 2008 to 2014.

    personal-name_data_google-trends.txt.gz, personal-name_data_wikipedia.txt.gz:
    The first column is the period to be collected, the second column is the source (Google or Wikipedia), the third column is the Wikipedia article id, the fourth column is the search keyword, the fifth column is the date, and the sixth column is the value of search trend or page view.

    Publication

    This data set was created for our study. If you make use of this data set, please cite:
    Mitsuo Yoshida, Yuki Arase, Takaaki Tsunoda, Mikio Yamamoto. Wikipedia Page View Reflects Web Search Trend. Proceedings of the 2015 ACM Web Science Conference (WebSci '15). no.65, pp.1-2, 2015.
    http://dx.doi.org/10.1145/2786451.2786495
    http://arxiv.org/abs/1509.02218 (author-created version)

    Note

    The raw data of Wikipedia page views is available in the following page.
    http://dumps.wikimedia.org/other/pagecounts-raw/

  4. h

    google_search_terms_training_data

    • huggingface.co
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hoshang Chenoy (2024). google_search_terms_training_data [Dataset]. https://huggingface.co/datasets/hoshangc/google_search_terms_training_data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2024
    Authors
    Hoshang Chenoy
    Description

    Dataset Card for Dataset Name

    This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Dataset Name: Google Search Trends Top Rising Search Terms Description: The Google Search Trends Top Rising Search Terms dataset provides valuable insights into the most rapidly growing search queries on the Google search engine. It offers a comprehensive collection of trending search
 See the full description on the dataset page: https://huggingface.co/datasets/hoshangc/google_search_terms_training_data.

  5. u

    Data from: Inventory of online public databases and repositories holding...

    • agdatacommons.nal.usda.gov
    • s.cnmilf.com
    • +4more
    txt
    Updated Feb 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erin Antognoli; Jonathan Sears; Cynthia Parr (2024). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. http://doi.org/10.15482/USDA.ADC/1389839
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Ag Data Commons
    Authors
    Erin Antognoli; Jonathan Sears; Cynthia Parr
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to

    establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data

    Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered.
    Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review:

    Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection.
    Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation.

    See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

  6. Z

    Data from: Qbias – A Dataset on Media Bias in Search Queries and Query...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haak, Fabian (2023). Qbias – A Dataset on Media Bias in Search Queries and Query Suggestions [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7682914
    Explore at:
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    Schaer, Philipp
    Haak, Fabian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present Qbias, two novel datasets that promote the investigation of bias in online news search as described in

    Fabian Haak and Philipp Schaer. 2023. 𝑄𝑏𝑖𝑎𝑠 - A Dataset on Media Bias in Search Queries and Query Suggestions. In Proceedings of ACM Web Science Conference (WebSci’23). ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3578503.3583628.

    Dataset 1: AllSides Balanced News Dataset (allsides_balanced_news_headlines-texts.csv)

    The dataset contains 21,747 news articles collected from AllSides balanced news headline roundups in November 2022 as presented in our publication. The AllSides balanced news feature three expert-selected U.S. news articles from sources of different political views (left, right, center), often featuring spin bias, and slant other forms of non-neutral reporting on political news. All articles are tagged with a bias label by four expert annotators based on the expressed political partisanship, left, right, or neutral. The AllSides balanced news aims to offer multiple political perspectives on important news stories, educate users on biases, and provide multiple viewpoints. Collected data further includes headlines, dates, news texts, topic tags (e.g., "Republican party", "coronavirus", "federal jobs"), and the publishing news outlet. We also include AllSides' neutral description of the topic of the articles. Overall, the dataset contains 10,273 articles tagged as left, 7,222 as right, and 4,252 as center.

    To provide easier access to the most recent and complete version of the dataset for future research, we provide a scraping tool and a regularly updated version of the dataset at https://github.com/irgroup/Qbias. The repository also contains regularly updated more recent versions of the dataset with additional tags (such as the URL to the article). We chose to publish the version used for fine-tuning the models on Zenodo to enable the reproduction of the results of our study.

    Dataset 2: Search Query Suggestions (suggestions.csv)

    The second dataset we provide consists of 671,669 search query suggestions for root queries based on tags of the AllSides biased news dataset. We collected search query suggestions from Google and Bing for the 1,431 topic tags, that have been used for tagging AllSides news at least five times, approximately half of the total number of topics. The topic tags include names, a wide range of political terms, agendas, and topics (e.g., "communism", "libertarian party", "same-sex marriage"), cultural and religious terms (e.g., "Ramadan", "pope Francis"), locations and other news-relevant terms. On average, the dataset contains 469 search queries for each topic. In total, 318,185 suggestions have been retrieved from Google and 353,484 from Bing.

    The file contains a "root_term" column based on the AllSides topic tags. The "query_input" column contains the search term submitted to the search engine ("search_engine"). "query_suggestion" and "rank" represents the search query suggestions at the respective positions returned by the search engines at the given time of search "datetime". We scraped our data from a US server saved in "location".

    We retrieved ten search query suggestions provided by the Google and Bing search autocomplete systems for the input of each of these root queries, without performing a search. Furthermore, we extended the root queries by the letters a to z (e.g., "democrats" (root term) >> "democrats a" (query input) >> "democrats and recession" (query suggestion)) to simulate a user's input during information search and generate a total of up to 270 query suggestions per topic and search engine. The dataset we provide contains columns for root term, query input, and query suggestion for each suggested query. The location from which the search is performed is the location of the Google servers running Colab, in our case Iowa in the United States of America, which is added to the dataset.

    AllSides Scraper

    At https://github.com/irgroup/Qbias, we provide a scraping tool, that allows for the automatic retrieval of all available articles at the AllSides balanced news headlines.

    We want to provide an easy means of retrieving the news and all corresponding information. For many tasks it is relevant to have the most recent documents available. Thus, we provide this Python-based scraper, that scrapes all available AllSides news articles and gathers available information. By providing the scraper we facilitate access to a recent version of the dataset for other researchers.

  7. e

    Just Google It - Digital Research Practices of Humanities Scholars - Dataset...

    • b2find.eudat.eu
    Updated Jul 2, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Just Google It - Digital Research Practices of Humanities Scholars - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/bc375396-fb76-5c42-8f96-186b13678956
    Explore at:
    Dataset updated
    Jul 2, 2013
    Description

    The transition from analog to digital archives and the recent explosion of online content offers researchers novel ways of engaging with data. The crucial question for ensuring a balance between the supply and demand-side of data, is whether this trend connects to existing scholarly practices and to the average search skills of researchers. To gain insight into this process a survey was conducted among nearly three hundred (N= 288) humanities scholars in the Netherlands and Belgium with the aim of finding answers to the following questions: 1) To what extent are digital databases and archives used? 2) What are the preferences in search functionalities 3) Are there differences in search strategies between novices and experts of information retrieval? Our results show that while scholars actively engage in research online they mainly search for text and images. General search systems such as Google and JSTOR are predominant, while large-scale collections such as Europeana are rarely consulted. Searching with keywords is the dominant search strategy and advanced search options are rarely used. When comparing novice and more experienced searchers, the first tend to have a more narrow selection of search engines, and mostly use keywords. Our overall findings indicate that Google is the key player among available search engines. This dominant use illustrates the paradoxical attitude of scholars toward Google: while transparency of provenance and selection are deemed key academic requirements, the workings of the Google algorithm remain unclear. We conclude that Google introduces a black box into digital scholarly practices, indicating scholars will become increasingly dependent on such black boxed algorithms. This calls for a reconsideration of the academic principles of provenance and context.

  8. Transparency in Keyword Faceted Search: a dataset of Google Shopping html...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cozza Vittoria; Cozza Vittoria; Hoang Van Tien; Hoang Van Tien; Petrocchi Marinella; Petrocchi Marinella; De Nicola Rocco; De Nicola Rocco (2020). Transparency in Keyword Faceted Search: a dataset of Google Shopping html pages [Dataset]. http://doi.org/10.5281/zenodo.1491557
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Cozza Vittoria; Cozza Vittoria; Hoang Van Tien; Hoang Van Tien; Petrocchi Marinella; Petrocchi Marinella; De Nicola Rocco; De Nicola Rocco
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains a collection of around 2,000 HTML pages: these web pages contain the search results obtained in return to queries for different products, searched by a set of synthetic users surfing Google Shopping (US version) from different locations, in July, 2016.

    Each file in the collection has a name where there is indicated the location from where the search has been done, the userID, and the searched product: no_email_LOCATION_USERID.PRODUCT.shopping_testing.#.html

    The locations are Philippines (PHI), United States (US), India (IN). The userIDs: 26 to 30 for users searching from Philippines, 1 to 5 from US, 11 to 15 from India.

    Products have been choice following 130 keywords (e.g., MP3 player, MP4 Watch, Personal organizer, Television, etc.).

    In the following, we describe how the search results have been collected.

    Each user has a fresh profile. The creation of a new profile corresponds to launch a new, isolated, web browser client instance and open the Google Shopping US web page.

    To mimic real users, the synthetic users can browse, scroll pages, stay on a page, and click on links.

    A fully-fledged web browser is used to get the correct desktop version of the website under investigation. This is because websites could be designed to behave according to user agents, as witnessed by the differences between the mobile and desktop versions of the same website.

    The prices are the retail ones displayed by Google Shopping in US dollars (thus, excluding shipping fees).

    Several frameworks have been proposed for interacting with web browsers and analysing results from search engines. This research adopts OpenWPM. OpenWPM is automatised with Selenium to efficiently create and manage different users with isolated Firefox and Chrome client instances, each of them with their own associated cookies.

    The experiments run, on average, 24 hours. In each of them, the software runs on our local server, but the browser's traffic is redirected to the designated remote servers (i.e., to India), via tunneling in SOCKS proxies. This way, all commands are simultaneously distributed over all proxies. The experiments adopt the Mozilla Firefox browser (version 45.0) for the web browsing tasks and run under Ubuntu 14.04. Also, for each query, we consider the first page of results, counting 40 products. Among them, the focus of the experiments is mostly on the top 10 and top 3 results.

    Due to connection errors, one of the Philippine profiles have no associated results. Also, for Philippines, a few keywords did not lead to any results: videocassette recorders, totes, umbrellas. Similarly, for US, no results were for totes and umbrellas.

    The search results have been analyzed in order to check if there were evidence of price steering, based on users' location.

    One term of usage applies:

    In any research product whose findings are based on this dataset, please cite

    @inproceedings{DBLP:conf/ircdl/CozzaHPN19,
     author  = {Vittoria Cozza and
            Van Tien Hoang and
            Marinella Petrocchi and
            Rocco {De Nicola}},
     title   = {Transparency in Keyword Faceted Search: An Investigation on Google
            Shopping},
     booktitle = {Digital Libraries: Supporting Open Science - 15th Italian Research
            Conference on Digital Libraries, {IRCDL} 2019, Pisa, Italy, January
            31 - February 1, 2019, Proceedings},
     pages   = {29--43},
     year   = {2019},
     crossref = {DBLP:conf/ircdl/2019},
     url    = {https://doi.org/10.1007/978-3-030-11226-4\_3},
     doi    = {10.1007/978-3-030-11226-4\_3},
     timestamp = {Fri, 18 Jan 2019 23:22:50 +0100},
     biburl  = {https://dblp.org/rec/bib/conf/ircdl/CozzaHPN19},
     bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    

  9. R

    Food_new Dataset

    • universe.roboflow.com
    zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allergen30 (2024). Food_new Dataset [Dataset]. https://universe.roboflow.com/allergen30/food_new-uuulf
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset authored and provided by
    Allergen30
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Food Bounding Boxes
    Description

    Allergen30

    About Allergen30

    Allergen30 is created by Mayank Mishra, Nikunj Bansal, Tanmay Sarkar and Tanupriya Choudhury with a goal of building a robust detection model that can assist people in avoiding possible allergic reactions.

    It contains more than 6,000 images of 30 commonly used food items which can cause an adverse reaction within a human body. This dataset is one of the first research attempts in training a deep learning based computer vision model to detect the presence of such food items from images. It also serves as a benchmark for evaluating the efficacy of object detection methods in learning the otherwise difficult visual cues related to food items.

    Description of class labels

    There are multiple food items pertaining to specific food intolerances which can trigger an allergic reaction. Such food intolerance primarily include Lactose, Histamine, Gluten, Salicylate, Caffeine and Ovomucoid intolerance. https://github.com/mmayank74567/mmayank74567.github.io/blob/master/images/FoodIntol.png?raw=true" alt="Food intolerance">

    The following table contains the description relating to the 30 class labels in our dataset.

    S. No.AllergenFood labelDescription
    1OvomucoideggImages of egg with yolk (e.g. sunny side up eggs)
    2Ovomucoidwhole_egg_boiledImages of soft and hard boiled eggs
    3Lactose/HistaminemilkImages of milk in a glass
    4LactoseicecreamImages of icecream scoops
    5LactosecheeseImages of swiss cheese
    6Lactose/ Caffeinemilk_based_beverageImages of tea/ coffee with milk in a cup/glass
    7Lactose/CaffeinechocolateImages of chocolate bars
    8Caffeinenon_milk_based_beverageImages of soft drinks and tea/coffee without milk in a cup/glass
    9Histaminecooked_meatImages of cooked meat
    10Histamineraw_meatImages of raw meat
    11HistaminealcoholImages of alcohol bottles
    12Histaminealcohol_glassImages of wine glasses with alcohol
    13HistaminespinachImages of spinach bundle
    14HistamineavocadoImages of avocado sliced in half
    15HistamineeggplantImages of eggplant
    16SalicylateblueberryImages of blueberry
    17SalicylateblackberryImages of blackberry
    18SalicylatestrawberryImages of strawberry
    19SalicylatepineappleImages of pineapple
    20SalicylatecapsicumImages of bell pepper
    21SalicylatemushroomImages of mushrooms
    22SalicylatedatesImages of dates
    23SalicylatealmondsImages of almonds
    24SalicylatepistachiosImages of pistachios
    25SalicylatetomatoImages of tomato and tomato slices
    26GlutenrotiImages of roti
    27GlutenpastaImages of one serving of penne pasta
    28GlutenbreadImages of bread slices
    29Glutenbread_loafImages of bread loaf
    30GlutenpizzaImages of pizza and pizza slices

    Data collection

    We used search engines (Google and Bing) to crawl and look for suitable images using JavaScript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled. When downloading images from search engines, many images were irrelevant to the purpose, especially the ones with a lot of text in them. We deployed the EAST text detector to segregate such images. Finally, a comprehensive manual inspection was conducted to ensure the relevancy of images in the dataset.

    Fair use

    This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. If you wish to use copyrighted material in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes.(adapted from Christopher Thomas).

    **Citatio

  10. h

    healthsearchqa

    • huggingface.co
    Updated Mar 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AISC Team D1 (2024). healthsearchqa [Dataset]. https://huggingface.co/datasets/aisc-team-d1/healthsearchqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2024
    Dataset authored and provided by
    AISC Team D1
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    HealthSearchQA

    Dataset of consumer health questions released by Google for the Med-PaLM paper (arXiv preprint). From the paper: We curated our own additional dataset consisting of 3,173 commonly searched consumer questions, referred to as HealthSearchQA. The dataset was curated using seed medical conditions and their associated symptoms. We used the seed data to retrieve publicly-available commonly searched questions generated by a search engine, which were displayed to all users
 See the full description on the dataset page: https://huggingface.co/datasets/aisc-team-d1/healthsearchqa.

  11. e

    Semantic Query Analysis from the Global Science Gateway - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Dec 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Semantic Query Analysis from the Global Science Gateway - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/2cf68914-a4ff-535e-89bc-9b86b2ca555c
    Explore at:
    Dataset updated
    Dec 19, 2018
    Description

    Nowadays web portals play an essential role in searching and retrieving information in the several fields of knowledge: they are ever more technologically advanced and designed for supporting the storage of a huge amount of information in natural language originating from the queries launched by users worldwide.A good example is given by the WorldWideScience search engine:The database is available at . It is based on a similar gateway, Science.gov, which is the major path to U.S. government science information, as it pulls together Web-based resources from various agencies. The information in the database is intended to be of high quality and authority, as well as the most current available from the participating countries in the Alliance, so users will find that the results will be more refined than those from a general search of Google. It covers the fields of medicine, agriculture, the environment, and energy, as well as basic sciences. Most of the information may be obtained free of charge (the database itself may be used free of charge) and is considered ‘‘open domain.’’ As of this writing, there are about 60 countries participating in WorldWideScience.org, providing access to 50+databases and information portals. Not all content is in English. (Bronson, 2009)Given this scenario, we focused on building a corpus constituted by the query logs registered by the GreyGuide: Repository and Portal to Good Practices and Resources in Grey Literature and received by the WorldWideScience.org (The Global Science Gateway) portal: the aim is to retrieve information related to social media which as of today represent a considerable source of data more and more widely used for research ends.This project includes eight months of query logs registered between July 2017 and February 2018 for a total of 445,827 queries. The analysis mainly concentrates on the semantics of the queries received from the portal clients: it is a process of information retrieval from a rich digital catalogue whose language is dynamic, is evolving and follows – as well as reflects – the cultural changes of our modern society.

  12. gooaq

    • huggingface.co
    • opendatalab.com
    Updated May 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2024). gooaq [Dataset]. https://huggingface.co/datasets/allenai/gooaq
    Explore at:
    Dataset updated
    May 23, 2024
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    GooAQ is a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatically from the Google search engine using its autocomplete feature. This results in naturalistic questions of practical interest that are nonetheless short and expressed using simple language. GooAQ answers are mined from Google's responses to our collected questions, specifically from the answer boxes in the search results. This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections.

  13. d

    Outscraper Google Maps Scraper

    • datarade.ai
    .csv, .xls, .json
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Outscraper Google Maps Scraper [Dataset]. https://datarade.ai/data-products/outscraper-google-maps-scraper-outscraper
    Explore at:
    .csv, .xls, .jsonAvailable download formats
    Dataset updated
    Dec 9, 2021
    Area covered
    United States
    Description

    Are you looking to identify B2B leads to promote your business, product, or service? Outscraper Google Maps Scraper might just be the tool you've been searching for. This powerful software enables you to extract business data directly from Google's extensive database, which spans millions of businesses across countless industries worldwide.

    Outscraper Google Maps Scraper is a tool built with advanced technology that lets you scrape a myriad of valuable information about businesses from Google's database. This information includes but is not limited to, business names, addresses, contact information, website URLs, reviews, ratings, and operational hours.

    Whether you are a small business trying to make a mark or a large enterprise exploring new territories, the data obtained from the Outscraper Google Maps Scraper can be a treasure trove. This tool provides a cost-effective, efficient, and accurate method to generate leads and gather market insights.

    By using Outscraper, you'll gain a significant competitive edge as it allows you to analyze your market and find potential B2B leads with precision. You can use this data to understand your competitors' landscape, discover new markets, or enhance your customer database. The tool offers the flexibility to extract data based on specific parameters like business category or geographic location, helping you to target the most relevant leads for your business.

    In a world that's growing increasingly data-driven, utilizing a tool like Outscraper Google Maps Scraper could be instrumental to your business' success. If you're looking to get ahead in your market and find B2B leads in a more efficient and precise manner, Outscraper is worth considering. It streamlines the data collection process, allowing you to focus on what truly matters – using the data to grow your business.

    https://outscraper.com/google-maps-scraper/

    As a result of the Google Maps scraping, your data file will contain the following details:

    Query Name Site Type Subtypes Category Phone Full Address Borough Street City Postal Code State Us State Country Country Code Latitude Longitude Time Zone Plus Code Rating Reviews Reviews Link Reviews Per Scores Photos Count Photo Street View Working Hours Working Hours Old Format Popular Times Business Status About Range Posts Verified Owner ID Owner Title Owner Link Reservation Links Booking Appointment Link Menu Link Order Links Location Link Place ID Google ID Reviews ID

    If you want to enrich your datasets with social media accounts and many more details you could combine Google Maps Scraper with Domain Contact Scraper.

    Domain Contact Scraper can scrape these details:

    Email Facebook Github Instagram Linkedin Phone Twitter Youtube

  14. e

    Replication data for: A Global Survey of Bike Bus Initiatives - Dataset -...

    • b2find.eudat.eu
    Updated Jun 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Replication data for: A Global Survey of Bike Bus Initiatives - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/f99f1655-b105-547d-b63b-51a7488c47ac
    Explore at:
    Dataset updated
    Jun 8, 2024
    Description

    This dataset aims to map the existing Bike Bus initiatives worldwide, identify their diversity, and understand their challenges and motivations. The dataset is divided into two files: the BB_database, which maps the name, location, and contact (when available) of the bike bus initiatives that we could find, and the BB_survey, which contains the data resulting from an online survey aimed at Bike Bus organizers identified through the BB_database. The data from the survey includes information on the name of the Bike Bus initiatives, location, organizing actors, starting year, contact details, route characteristics (travel time, distance, frequency, and space to cycle), participants (number of children and adults, age, and gender), and management (child supervision, goals, barriers, motivations and challenges). Most of the initiatives are in Catalonia and Spain, yet the database includes Bike Buses worldwide. Data was derived from unpublished master dissertation: MartĂ­n, S. (2022). BiciBĂșs in Catalonia: Rutes and characteristics of the bike-train movement. Institute of Environmental Science and Technology at the Universitat AutĂČnoma de Barcelona. Description of methods used for collection-generation of data: The following data comes from a mapping excercise of Bike Buses and an online survey aimed at Bike Bus organizers. It builds on the online survey by MartĂ­n (2022) for her master's thesis. In this firt phase of the project leaded by MartĂ­n (2022) respondents were approached via social media and email from an archival analysis in Google, Facebook, Instagram, and Twitter (now X) using the keyword "Bicibus". The rest of the respondents were approached using the snowball sampling method. The scope of this first survey was Spain, and it was available in Spanish and Catalan. The survey received 19 responses during this phase. The second phase of the data collection expanded the scope of the survey internationally. The questions were translated into English, and the literature review was done adding the search engines Google Scholar and Scopus, using the keywords "Bike Bus" and "Bike Train". By the end of the data collection, the survey received 143 responses.

  15. Multilingual Scraper of Privacy Policies and Terms of Service

    • zenodo.org
    bin, zip
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Bernhard; David Bernhard; Luka Nenadic; Luka Nenadic; Stefan Bechtold; Karel Kubicek; Karel Kubicek; Stefan Bechtold (2025). Multilingual Scraper of Privacy Policies and Terms of Service [Dataset]. http://doi.org/10.5281/zenodo.14562039
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David Bernhard; David Bernhard; Luka Nenadic; Luka Nenadic; Stefan Bechtold; Karel Kubicek; Karel Kubicek; Stefan Bechtold
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Multilingual Scraper of Privacy Policies and Terms of Service: Scraped Documents of 2024

    This dataset supplements publication "Multilingual Scraper of Privacy Policies and Terms of Service" at ACM CSLAW’25, March 25–27, 2025, MĂŒnchen, Germany. It includes the first 12 months of scraped policies and terms from about 800k websites, see concrete numbers below.

    The following table lists the amount of websites visited per month:

    MonthNumber of websites
    2024-01551'148
    2024-02792'921
    2024-03844'537
    2024-04802'169
    2024-05805'878
    2024-06809'518
    2024-07811'418
    2024-08813'534
    2024-09814'321
    2024-10817'586
    2024-11828'662
    2024-12827'101

    The amount of websites visited should always be higher than the number of jobs (Table 1 of the paper) as a website may redirect, resulting in two websites scraped or it has to be retried.

    To simplify the access, we release the data in large CSVs. Namely, there is one file for policies and another for terms per month. All of these files contain all metadata that are usable for the analysis. If your favourite CSV parser reports the same numbers as above then our dataset is correctly parsed. We use ‘,’ as a separator, the first row is the heading and strings are in quotes.

    Since our scraper sometimes collects other documents than policies and terms (for how often this happens, see the evaluation in Sec. 4 of the publication) that might contain personal data such as addresses of authors of websites that they maintain only for a selected audience. We therefore decided to reduce the risks for websites by anonymizing the data using Presidio. Presidio substitutes personal data with tokens. If your personal data has not been effectively anonymized from the database and you wish for it to be deleted, please contact us.

    Preliminaries

    The uncompressed dataset is about 125 GB in size, so you will need sufficient storage. This also means that you likely cannot process all the data at once in your memory, so we split the data in months and in files for policies and terms.

    Files and structure

    The files have the following names:

    • 2024_policy.csv for policies
    • 2024_terms.csv for terms

    Shared metadata

    Both files contain the following metadata columns:

    • website_month_id - identification of crawled website
    • job_id - one website can have multiple jobs in case of redirects (but most commonly has only one)
    • website_index_status - network state of loading the index page. This is resolved by the Chromed DevTools Protocol.
      • DNS_ERROR - domain cannot be resolved
      • OK - all fine
      • REDIRECT - domain redirect to somewhere else
      • TIMEOUT - the request timed out
      • BAD_CONTENT_TYPE - 415 Unsupported Media Type
      • HTTP_ERROR - 404 error
      • TCP_ERROR - error in the network connection
      • UNKNOWN_ERROR - unknown error
    • website_lang - language of index page detected based on langdetect library
    • website_url - the URL of the website sampled from the CrUX list (may contain subdomains, etc). Use this as a unique identifier for connecting data between months.
    • job_domain_status - indicates the status of loading the index page. Can be:
      • OK - all works well (at the moment, should be all entries)
      • BLACKLISTED - URL is on our list of blocked URLs
      • UNSAFE - website is not safe according to save browsing API by Google
      • LOCATION_BLOCKED - country is in the list of blocked countries
    • job_started_at - when the visit of the website was started
    • job_ended_at - when the visit of the website was ended
    • job_crux_popularity - JSON with all popularity ranks of the website this month
    • job_index_redirect - when we detect that the domain redirects us, we stop the crawl and create a new job with the target URL. This saves time if many websites redirect to one target, as it will be crawled only once. The index_redirect is then the job.id corresponding to the redirect target.
    • job_num_starts - amount of crawlers that started this job (counts restarts in case of unsuccessful crawl, max is 3)
    • job_from_static - whether this job was included in the static selection (see Sec. 3.3 of the paper)
    • job_from_dynamic - whether this job was included in the dynamic selection (see Sec. 3.3 of the paper) - this is not exclusive with from_static - both can be true when the lists overlap.
    • job_crawl_name - our name of the crawl, contains year and month (e.g., 'regular-2024-12' for regular crawls, in Dec 2024)

    Policy data

    • policy_url_id - ID of the URL this policy has
    • policy_keyword_score - score (higher is better) according to the crawler's keywords list that given document is a policy
    • policy_ml_probability - probability assigned by the BERT model that given document is a policy
    • policy_consideration_basis - on which basis we decided that this url is policy. The following three options are executed by the crawler in this order:
      1. 'keyword matching' - this policy was found using the crawler navigation (which is based on keywords)
      2. 'search' - this policy was found using search engine
      3. 'path guessing' - this policy was found by using well-known URLs like example.com/policy
    • policy_url - full URL to the policy
    • policy_content_hash - used as identifier - if the document remained the same between crawls, it won't create a new entry
    • policy_content - contains the text of policies and terms extracted to Markdown using Mozilla's readability library
    • policy_lang - Language detected by fasttext of the content

    Terms data

    Analogous to policy data, just substitute policy to terms.

    Updates

    Check this Google Docs for an updated version of this README.md.

  16. e

    Innovation, Language, and the Web - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Feb 19, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Innovation, Language, and the Web - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/66f67b04-e6d8-5a8d-a979-c787ffdd39db
    Explore at:
    Dataset updated
    Feb 19, 2013
    Description

    Language and innovation are inseparable. Language conveys ideas which are essential in innovation, establishes the most immediate connections with our conceptualisation of the outside world, and provides the building blocks for communication. Every linguistic choice is necessarily meaningful, and it involves the parallel construction of form and meaning. From this perspective, language is a dynamic knowledge construction process. Emphasis is laid on investigating how words are used to describe innovation, and how innovation topics can influence word usage and collocational behaviour. The lexical representation of innovative knowledge in a context-based approach is closely related to the representation of knowledge itself, and gives the opportunity to reduce the gap between knowledge representation and knowledge understanding.This will bring into focus the dynamic interplay between lexical creativity and innovative pragmatic contexts, and the necessity for a dynamic semantic shift from context-driven vagueness to domain-driven specialisation.Methodology and experimental evidence - Method and materials: the challenge of identifying changes in word sense has only recently been considered in Computational Linguistics. To investigate the themes discussed in the previous sections genre-oriented and stylistically heterogeneous English texts are analysed, with the support of SKETCH ENGINE (Kilgarriff et al., 2004), which is a corpus query tool, based on a distributed infrastructure, that generates word sketches and thesauri which specify similarities and differences between near-synonyms. By selecting a collocate of interest in a sketched word, the user is taken to a concordance of the corpus evidence giving rise to that collocate. Ambiguous and polysemous words have been selected with particular reference to innovative domains, and their collocations are analysed. In particular, we considered the domain of brain sciences and new technologies of brain functional imaging, the domain of knowledge management processes, and the field of information technologies, by mainly focusing on the following test words: IMAGING, RETENTION, STORAGE, CORPUS, NETWORK, GRID. The selected words present a potentially high degree of semantic ambiguity or polysemy and different degrees of semantic specialisation, which can be analysed objectively by studying their context collocations. For a terminology exploration, both domain-specific and general-purpose texts materials are selected by using generic search web engine queries (www.google.com by using seed words), domain-specific databases and type coherent multidisciplinary large corpora (e.g. www.opengrey.eu, www.ncbi.nlm.nih.gov/pubmed by selecting the domain). Collocations and concordances are then compared with large balanced corpora (e.g. the British National Corpus, British Academic Written English, New Model Corpus, and the like, whose size ranges between 8 M and 12 G tokens).

  17. r

    Coral restoration database – Dataset from Bostrom-Einarsson et al 2019 (NESP...

    • researchdata.edu.au
    bin
    Updated 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bostrom-Einarsson, Lisa, Dr.; Ceccarelli, Daniela, Dr.; Cook, Nathan, Mr.; Hein, Margaux, Dr.; Smith, Adam, Dr.; McLeod, Ian M, Dr. (2019). Coral restoration database – Dataset from Bostrom-Einarsson et al 2019 (NESP TWQ 4.3, JCU) [Dataset]. https://researchdata.edu.au/coral-restoration-database-43-jcu/1425277
    Explore at:
    binAvailable download formats
    Dataset updated
    2019
    Dataset provided by
    eAtlas
    Authors
    Bostrom-Einarsson, Lisa, Dr.; Ceccarelli, Daniela, Dr.; Cook, Nathan, Mr.; Hein, Margaux, Dr.; Smith, Adam, Dr.; McLeod, Ian M, Dr.
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2017 - Jan 31, 2019
    Description

    This dataset consists of a review of case studies and descriptions of coral restoration methods from four sources: 1) the primary literature (i.e. published peer-reviewed scientific literature), 2) grey literature (e.g. scientific reports and technical summaries from experts in the field), 3) online descriptions (e.g. blogs and online videos describing projects), and 4) an online survey targeting restoration practitioners (doi:10.5061/dryad.p6r3816).

    Included are only those case studies which actively conducted coral restoration (i.e. at least one stage of scleractinian coral life-history was involved). This excludes indirect coral restoration projects, such as disturbance mitigation (e.g. predator removal, disease control etc.) and passive restoration interventions (e.g. enforcement of control against dynamite fishing or water quality improvement). It also excludes many artificial reefs, in particular if the aim was fisheries enhancement (i.e. fish aggregation devices), and if corals were not included in the method. To the best of our abilities, duplication of case studies was avoided across the four separate sources, so that each case in the review and database represents a separate project.

    This dataset is currently under embargo until the publication of review manuscript is made available.

    Methods: More than 40 separate categories of data were recorded from each case study and entered into a database. These included data on (1) the information source, (2) the case study particulars (e.g. location, duration, spatial scale, objectives, etc.), (3) specific details about the methods, (4) coral details (e.g. genus, species, morphology), (5) monitoring details, and (6) the outcomes and conclusions.

    Primary literature Multiple search engines were used to achieve the most complete coverage of the scientific literature. First, the scientific literature was searched using Google Scholar with the keywords “coral* + restoration”. Because the field (and therefore search results) are dominated by transplantation studies, separate searches were then conducted for other common techniques using “coral* + restoration + [technique name]”. This search was further complemented by using the same keywords in ISI Web of Knowledge (search yield n=738). Studies were then manually selected that fulfilled our criteria for active coral restoration described above (final yield n= 221). In those cases where a single paper describes several different projects or methods, these were split into separate case studies. Finally, prior reviews of coral restoration were consulted to obtain case studies from their reference lists.

    Grey literature While many reports appeared in the Google Scholar literature searches, The Nature Conservancy (TNC) database of reports for North American coastal restoration projects (http://projects.tnc.org/coastal/) was also conducted. This was supplemented with reports listed in the reference lists of other papers, reports and reviews, and during the online searches (n=30).

    Online records Small-scale projects conducted without substantial input from researchers, academics, non-governmental organisations (NGO) or coral reef managers often do not result in formal written accounts of methods. To access this information, we conducted online searches of YouTube, Facebook and Google, using the search terms “Coral restoration”. The information provided in videos, blog posts and websites to describe further projects (n=48) was also used. Due to the unverified nature of such accounts, the data collected from these online-only records was limited compared to peer reviewed literature and surveys. At the minimum, the location, the methods used and reported outcomes or lessons learned were included in this review.

    Online survey To access information from projects not published elsewhere, an online survey targeting restoration practitioners was designed. The survey consisted of 25 questions querying restoration practitioners regarding projects they had undertaken under JCU human ethics H7218 (following the Australian National Statement on Ethical Conduct in Human Research, 2007). These data (n=63) are included in all calculations within this review, but are not publicly available to preserve the anonymity of participants. Although we encouraged participants to fill out a separate survey for each case study, it is possible that participants included multiple separate projects in a single survey, which may reduce the real number of case studies reported.

    Data analysis Percentages, counts and other quantifications from the database refer to the total number of case studies with data in that category. Case studies where data were lacking for the category in question, or lack appropriate detail (e.g. reporting ‘mixed’ for coral genera) are not included in calculations. Many categories allowed multiple answers (e.g. coral species); these were split into separate records for calculations (e.g. coral species n). For this reason, absolute numbers may exceed the number of case studies in the database. However, percentages reflect the proportion of case studies in each category. We used the seven objectives outlined in [1] to classify the objective of each case study, with an additional two categories (‘scientific research’ and ‘ecological engineering’). We used Tableau to visualise and analyse the database (Desktop Professional Edition, version 10.5, Tableau Software). The data have been made available following the FAIR Guiding Principles for scientific data management and stewardship [2]. Data available from the Dryad Digital Repository downloaded here (https://doi.org/10.5061/dryad.p6r3816), and visually explored: https://public.tableau.com/views/CoralRestorationDatabase-Visualisation/Coralrestorationmethods?:embed=y&:display_count=yes&publish=yes&:showVizHome=no#1.

    Limitations: While our expanded search enabled us to avoid the bias from the more limited published literature, we acknowledge that using sources that have not undergone rigorous peer-review potentially introduces another bias. Many government reports undergo an informal peer-review; however, survey results and online descriptions may present a subjective account of restoration outcomes. To reduce subjective assessment of case studies, we opted not to interpret results or survey answers, instead only recording what was explicitly stated in each document [3, 4].

    Defining restoration In this review, active restoration methods are methods which reintroduce coral (e.g. coral fragment transplantation, or larval enhancement) or augment coral assemblages (e.g. substrate stabilisation, or algal removal), for the purposes of restoring the reef ecosystem. In the published literature and elsewhere, there are many terms that describe the same intervention. For clarity, we provide the terms we have used in the review, their definitions and alternative terms (see references). Passive restoration methods such as predator removal (e.g. crown-of-thorns starfish and Drupella control) have been excluded, unless they were conducted in conjunction with active restoration (e.g. macroalgal removal combined with transplantation).

    Format: The data is supplied as an excel file with three separate tabs for 1) peer reviewed literature 2) grey literature, and 3) a description of the objectives form Hein et al. 2017. Survey responses have been excluded to preserve the anonymity of the respondents.

    This dataset is a database that underpins a 2018 report and 2019 published review of coral restoration methods from around the world. - Bostrom-Einarsson L, Ceccarelli D, Babcock R.C., Bayraktarov E, Cook N, Harrison P, Hein M, Shaver E, Smith A, Stewart-Sinclair P.J, Vardi T, McLeod I.M. 2018 - Coral restoration in a changing world - A global synthesis of methods and techniques, report to the National Environmental Science Program. Reef and Rainforest Research Centre Ltd, Cairns (63pp.). - Review manuscript is currently under review.

    Data Dictionary: The Data Dictionary is emended in the excel spreadsheet. Comments are included in the column titles to aid interpretation, and/or refer to additional information tabs. For more information on each column, open the red triangle [located top right of cell].

    References: 1. Hein MY, Willis BL, Beeden R, Birtles A. The need for broader ecological and socioeconomic tools to evaluate the effectiveness of coral restoration programs. Restoration Ecology. Wiley/Blackwell (10.1111); 2017;25: 873–883. doi:10.1111/rec.12580 2. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 2016 3. Nature Publishing Group; 2016;3: 160018. doi:10.1038/sdata.2016.18 3.Miller RL, Marsh H, Cottrell A, Hamann M. Protecting Migratory Species in the Australian Marine Environment: A Cross-Jurisdictional Analysis of Policy and Management Plans. Front Mar Sci. Frontiers; 2018;5: 211. doi:10.3389/fmars.2018.00229 4. Ortega-Argueta A, Baxter G, Hockings M. Compliance of Australian threatened species recovery plans with legislative requirements. Journal of Environmental Management. Elsevier; 2011;92: 2054–2060.

    Data Location:

    This dataset is filed in the eAtlas enduring data repository at: data\2018-2021-NESP-TWQ-4\4.3_Best-practice-coral-restoration

  18. Share of Yahoo in mobile search market India 2019-2024

    • statista.com
    Updated Mar 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Share of Yahoo in mobile search market India 2019-2024 [Dataset]. https://www.statista.com/statistics/938848/india-yahoo-share-in-mobile-search-market/
    Explore at:
    Dataset updated
    Mar 27, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2019 - Feb 2024
    Area covered
    India
    Description

    Yahoo's share in the mobile search engine market across India was about 0.03 percent in February 2024. This was a fall in market share compared to its standing of 0.24 percent in September 2018. The immense popularity and database of Google has left little to gain for other search engine operators in India.

  19. H

    Land Use Changes in The Mississippi River Basin Floodplains: 1941 to 2000...

    • hydroshare.org
    • search.dataone.org
    zip
    Updated Nov 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adnan Rajib; Qianjin Zheng; Heather E. Golden; Charles R. Lane; Qiusheng Wu; Jay R. Christensen; Ryan Morrison; Fernando Nardi; Antonio Annis (2022). Land Use Changes in The Mississippi River Basin Floodplains: 1941 to 2000 (version 1) [Dataset]. http://doi.org/10.4211/hs.41a3a9a9d8e54cc68f131b9a9c6c8c54
    Explore at:
    zip(274.8 MB)Available download formats
    Dataset updated
    Nov 24, 2022
    Dataset provided by
    HydroShare
    Authors
    Adnan Rajib; Qianjin Zheng; Heather E. Golden; Charles R. Lane; Qiusheng Wu; Jay R. Christensen; Ryan Morrison; Fernando Nardi; Antonio Annis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1940 - Dec 31, 2000
    Area covered
    Description

    This work has been published in the Nature Scientific Data. Suggested citation: Rajib et al. The changing face of floodplains in the Mississippi River Basin detected by a 60-year land use change dataset. Nature Scientific Data 8, 271 (2021). https://doi.org/10.1038/s41597-021-01048-w

    Here, we present the first-available dataset that quantifies land use change along the floodplains of the Mississippi River Basin (MRB) covering 60 years (1941-2000) at 250-m resolution. The MRB is the fourth largest river basin in the world (3.3 million sq km) comprising 41% of the United States and draining into the Gulf of Mexico, an area with an annually expanding and contracting hypoxic zone resulting from basin-wide over-enrichment of nutrients. The basin represents one of the most engineered systems in the world, and includes complex web of dams, levees, floodplains, and dikes. This new dataset reveals the heterogenous spatial extent of land use transformations in MRB floodplains. The domination transition of floodplains has been from natural ecosystems (e.g. wetlands or forests) to agricultural use. A steady increase in developed land use within the MRB floodplains was also evident.

    To maximize the reuse of this dataset, our contributions also include four unique products: (i) a Google Earth Engine interactive map visualization interface: https://gishub.org/mrb-floodplain (ii) a Google-based Python code that runs in any internet browser: https://colab.research.google.com/drive/1vmIaUCkL66CoTv4rNRIWpJXYXp4TlAKd?usp=sharing (iii) an online tutorial with visualizations facilitating classroom application of the code: https://serc.carleton.edu/hydromodules/steps/241489.html (iv) an instructional video showing how to run the code and partially reproduce the floodplain land use change dataset: https://youtu.be/wH0gif_y15A

  20. r

    Analysis of search queries suggested by a Swedish climate obstruction...

    • researchdata.se
    Updated Jul 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malte Rödl (2025). Analysis of search queries suggested by a Swedish climate obstruction network [Dataset]. http://doi.org/10.5878/zb1v-ba15
    Explore at:
    (15182), (2773), (3350), (8612), (1722)Available download formats
    Dataset updated
    Jul 1, 2025
    Dataset provided by
    Swedish University of Agricultural Sciences
    Authors
    Malte Rödl
    Time period covered
    Jan 1, 2014 - Jul 31, 2022
    Area covered
    Sweden
    Description

    This data comprises data traces related to search queries used in climate obstruction. It is based on "klimatsans" (Climate Sense or Climate Reason; translated from Swedish, cf. Vowles & Hultman, 2021), a Swedish blog and network which exists since 2014 and runs a Swedish-language blog and submits opinion pieces and letters to the editor to various Swedish news outlets. The stated aims of the network amount to first-level obstruction, i.e. they reject the scientific consensus that increased atmospheric CO2 leads to climate change.

    The data concerns how the network throughout its various publications invite readers to “google” certain words (keyphrases). The data set includes: 1) all blog posts published on klimatsans.com from January 2014 to June 2022; 2) all hyperlinks from the blog; 3) tabulation, count, and coding of all search queries suggested in the blog, as identified by following after the Swedish imperative verb "googla"; 4) tabulation of all uses of 25 selected keyphrases in Swedish newspapers; 5) results of search engine results pages for these 25 queries from Google and DuckDuckGo (each run three times: in plain, in verbatim using quotation marks, and preceded by the term "googla") (original data available via SĂŒnkler et al., 2023); 6) tabulation and coding of domains frequently targeted by hyperlinks and/or listed in search engine results pages.

    Furthermore, the data set includes some scripts for replication, an extensive README file for methodological additions, and details on coding schemes.

    The data was originally collected to investigate to trace data voids through the texts of their creators or proponents. This provides insights into how data voids are created, promoted, used, and if they do not disappear also abandoned.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
DataForSEO (2023). DataForSEO Google Full (Keywords+SERP) database, historical data available [Dataset]. https://datarade.ai/data-products/dataforseo-google-full-keywords-serp-database-historical-d-dataforseo
Organization logo

DataForSEO Google Full (Keywords+SERP) database, historical data available

Explore at:
.json, .csvAvailable download formats
Dataset updated
Aug 17, 2023
Dataset provided by
Authors
DataForSEO
Area covered
Sweden, Bolivia (Plurinational State of), Burkina Faso, Portugal, Paraguay, South Africa, Costa Rica, United Kingdom, CĂŽte d'Ivoire, Cyprus
Description

You can check the fields description in the documentation: current Full database: https://docs.dataforseo.com/v3/databases/google/full/?bash; Historical Full database: https://docs.dataforseo.com/v3/databases/google/history/full/?bash.

Full Google Database is a combination of the Advanced Google SERP Database and Google Keyword Database.

Google SERP Database offers millions of SERPs collected in 67 regions with most of Google’s advanced SERP features, including featured snippets, knowledge graphs, people also ask sections, top stories, and more.

Google Keyword Database encompasses billions of search terms enriched with related Google Ads data: search volume trends, CPC, competition, and more.

This database is available in JSON format only.

You don’t have to download fresh data dumps in JSON – we can deliver data straight to your storage or database. We send terrabytes of data to dozens of customers every month using Amazon S3, Google Cloud Storage, Microsoft Azure Blob, Eleasticsearch, and Google Big Query. Let us know if you’d like to get your data to any other storage or database.

Search
Clear search
Close search
Google apps
Main menu