42 datasets found
  1. Job Offers Web Scraping Search

    • kaggle.com
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Job Offers Web Scraping Search [Dataset]. https://www.kaggle.com/datasets/thedevastator/job-offers-web-scraping-search
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Job Offers Web Scraping Search

    Targeted Results to Find the Optimal Work Solution

    By [source]

    About this dataset

    This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:

    • Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.

    • Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!

    • Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!

    • Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!

      All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!

    Research Ideas

    • Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.
    • The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.
    • It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .

  2. Search Engines in the US - Market Research Report (2015-2030)

    • ibisworld.com
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IBISWorld (2025). Search Engines in the US - Market Research Report (2015-2030) [Dataset]. https://www.ibisworld.com/united-states/market-research-reports/search-engines-industry/
    Explore at:
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    IBISWorld
    License

    https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/

    Time period covered
    2015 - 2030
    Area covered
    United States
    Description

    Search engines, which collect, organize and display knowledge of the internet, remain central to the digital economy but are entering a period of rapid transformation driven by AI and shifting user behavior. Over the past five years, internet advertising spending maintained strong momentum, propelled by growing mobile internet access and consumer screen time. Consequently, industry revenue is expected to climb at a CAGR of 9.4% to $316.8 billion, including an anticipated rise of 7.7% in 2025, with profit at 18.6%. The industry stands apart from most in the tech sector, because of its platform-based revenue model, aggregation dynamics and deep integration with the broader digital ecosystem. While user engagement fuels relevance, it is advertiser demand that sustains revenue, requiring a careful balance between utility and monetization. This landscape has been reshaped by the rise of generative AI. Conversational tools and AI-generated summaries are reducing user interaction with traditional search results, challenging established SEO practices and disrupting referral-based traffic flows. Meanwhile, search engines are reconfiguring their ad models to prioritize quality and contextual relevance, moving away from legacy monetization strategies. These trends signal a broader shift in how search platforms operate, less as navigational tools and more as integrated, AI-driven environments. As digital behavior fragments and users seek information across apps like Amazon, TikTok and ChatGPT, industry revenue is still projected to climb at a CAGR of 7.3% to $449.9 billion through 2030. Advertisers are expected to continue investing in search, drawn by the format’s performance insights and optimization capabilities. However, AI is redefining search from a navigational tool into a task-oriented solution engine, where users expect conversational, multimodal and predictive answers instead of traditional results pages. To stay relevant, incumbent platforms must evolve into embedded AI utilities that power experiences across devices and enterprise workflows.

  3. Global search volume for "AI" keyword 2022-2023

    • statista.com
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global search volume for "AI" keyword 2022-2023 [Dataset]. https://www.statista.com/statistics/1398211/ai-keyword-traffic-volume/
    Explore at:
    Dataset updated
    May 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 2022 - Mar 2023
    Area covered
    Worldwide
    Description

    Between June 2022 and March 2023, the traffic volume for the keyword "AI" has tripled, going from around 7.9 million monthly searches to more than 30.4 million during the last month of the measured period. General interest in artificial intelligence (AI) has exploded in markets like the United States by the end of 2022. Likewise, interest for the application programming interfaces (API's) and plugins of artificial intelligence solutions, especially those of ChatGPT, has also seen a major increase since the release of the tool in November of 2022.

    The artificial intelligence market

    Valued at around 142.3 billion U.S. dollars in 2022, the artificial intelligence market is one the most promising tech segments for the rest of the decade, with more than five billion U.S. dollars invested in startups - the most notable being the Californian company OpenAI and its flagship application ChatGPT. Disruptive as it is, the adoption of AI has already sparked an alert for several industries, likely to affect job markets and thus raising concerns about cybercrime and other online misdeeds.

    The future of online search?

    Of most industries, the impact of the new tool developed by OpenAI may be felt by the online search market like a global earthquake. With chatbots providing search results in a dialogue format, the trend of AI-powered search engines unleashed by ChatGPT threw giant companies like Google and Microsoft into a race with startups and other competitors to present the best candidate for this disruptive (and experimental) online solution.

  4. d

    Corporations Search (Washington state)

    • catalog.data.gov
    • data.wa.gov
    • +1more
    Updated Sep 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.wa.gov (2024). Corporations Search (Washington state) [Dataset]. https://catalog.data.gov/dataset/corporations-search-from-secretary-of-state
    Explore at:
    Dataset updated
    Sep 6, 2024
    Dataset provided by
    data.wa.gov
    Area covered
    Washington
    Description

    This provides a link to the Washington Secretary of State's Corporations Search tool. The Corporations Data Extract feature is no longer available. Customers needing a list of multiple businesses can use our advanced search to create a list of businesses under specific parameters. You can export this information to an Excel spreadsheet to sort and search more extensively. Below are the steps to perform this type of search. The more specified parameter searches provide narrower search results. Please visit our Corporations and Charities Filing System by following this link https://ccfs.sos.wa.gov/ Scroll down to the “Corporation Search” section and click the “Advanced Search” button on the right. Under the first section, specify how you would like the business name searched. Only use this for single business lookups unless all the businesses you are searching have a common name (use the “contains” selection). Select the appropriate business type from the dropdown if you are looking for a list of a specific business type. For a list of a particular business type with a specific status, select that status under “Business Status.” You can also search by expiration date in this section. Under the “Date of Incorporation/Formation/Registration,” you can search by start or end date. Under the “Registered Agent/Governor Search” section, you can search all businesses with the same registered agent on record or governor listed. Once you have made all your search selections, click the green “Search” button at the bottom right of the page. A list will populate; scroll to the bottom and select the green Excel document icon with CSV. An Excel document should automatically download. If you have popups blocked, please unblock our site, and try again. Once you have opened the downloaded Excel spreadsheet, you can adjust the width of each column and sort the data using the data tab. You can also search by pressing CTRL+F on a Windows keyboard.

  5. Z

    Data for study "Direct Answers in Google Search Results"

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rutecka, Paulina (2020). Data for study "Direct Answers in Google Search Results" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3541091
    Explore at:
    Dataset updated
    Jun 9, 2020
    Dataset provided by
    Strzelecki, Artur
    Rutecka, Paulina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The goal of this research is to examine direct answers in Google web search engine. Dataset was collected using Senuto (https://www.senuto.com/). Senuto is as an online tool, that extracts data on websites visibility from Google search engine.

    Dataset contains the following elements:

    keyword,

    number of monthly searches,

    featured domain,

    featured main domain,

    featured position,

    featured type,

    featured url,

    content,

    content length.

    Dataset with visibility structure has 743 798 keywords that were resulting in SERPs with direct answer.

  6. d

    Data from: Efficient Keyword-Based Search for Top-K Cells in Text Cube

    • catalog.data.gov
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Efficient Keyword-Based Search for Top-K Cells in Text Cube [Dataset]. https://catalog.data.gov/dataset/efficient-keyword-based-search-for-top-k-cells-in-text-cube
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked-structures (e.g.,a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for coring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches, inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches. Citation: B. Ding, B. Zhao, C. X. Lin, J. Han, C. Zhai, A. N. Srivastava, and N. C. Oza, “Efficient Keyword-Based Search for Top-K Cells in Text Cube,” IEEE Transactions on Knowledge and Data Engineering, 2011.

  7. IM3 GO WEST Parameter Search Dataset

    • osti.gov
    Updated Feb 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MultiSector Dynamics - Living, Intuitive, Value-adding, Environment (2023). IM3 GO WEST Parameter Search Dataset [Dataset]. http://doi.org/10.57931/1923267
    Explore at:
    Dataset updated
    Feb 4, 2023
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    MultiSector Dynamics - Living, Intuitive, Value-adding, Environment
    Description

    GO WEST is an open-source power grid modeling framework for U.S. Western Interconnection, which allows users to tailor the model depending on their research study and science questions. It is developed to address weather and water dynamics, and associated vulnerabilities in this bulk power system. It covers 28 balancing authorities (BA) and 12 states in U.S. Western Interconnection. GO WEST allows users to select different number of nodes and come up with a simplified network by utilizing 10,000 nodal topology of U.S. Western Interconnection created by Texas A&M University. Users can try and select different number of nodes, mathematical formulations (linear programming vs. mixed-integer linear programming), transmission line limit scaling factors, and hurdle rate scaling factors. GO WEST offers a unit commitment and economic dispatch (UC/ED) module to simulate grid operations on an hourly scale. In this sense, users can calibrate and validate their model versions by comparing model outputs to historical datasets. Therefore, GO WEST can help researchers to strike a balance between model fidelity (i.e. accuracy) and computational complexity (i.e. runtime). This dataset includes model inputs and outputs from 600 model versions for each 2019, 2020, and 2021. The folder naming convention is as follows: Exp{Number of Nodes}_{Mathematical Formulation}_{Transmission Line Limit Scaling Factor in MW}_{Hurdle Rate Scaling Factor in %}_{Year}. Linear programming is designated with "simple" label whereas mixed-integer linear programming is designated with "coal" label. For example, "Exp100_simple_1000_50_2019" folder contains inputs and outputs from 2019 model version with 100 nodes, linear programming, +1000 MW transmission line limit scaling factor, and +50% hurdle rate scaling factor. GO WEST GitHub repository hosts all raw datasets, processing scripts, and model scripts. Please refer to the README file for a detailed description of the included files.

  8. Data from: Variation in quality of women's health topic information from...

    • zenodo.org
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Duval; Benjamin Duval (2025). Variation in quality of women's health topic information from systematic internet searches [Dataset]. http://doi.org/10.5281/zenodo.15839790
    Explore at:
    Dataset updated
    Jul 8, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benjamin Duval; Benjamin Duval
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    METHODS

    Topic determination

    The project was developed as a team science exercise during a course on Nutrient Biology (New Mexico Institute of Mining and Technology, New Mexico, USA; BIOL 4089/5089). Students were all women pursuing degrees in Biology and Earth Science, with extensive internet search acumen developed from coursework and personal experience. We (students and professor) devoted ~5 hours to discussing women’s health topics prior to searching, defining search criteria, and developing a scoring system. These discussions led to a list of 12, non-cancer health topics particular to women’s health associated with human cis-gender female biology. Considerations of transgender health were discussed, with the consensus decision that those issues are scientifically relevant but deserving of a separate analysis not included here.

    Search protocol

    After agreeing on search terms, we experimented with settings in the Advanced Search feature in Google (www.google.com), and collectively agreed to the following settings: Language (English); search terms appearing in the “text” of the page; ANY of the terms “woman”, “women” ,“female”; ALL terms when using a single topic from list above with the addition of the word “nutrient”. Figure 1 shows a screenshot for how a search was conducted for endometriosis as an example. To standardize data collection among investigators, all results from the first 5 pages of results were collected. Search result URLs were followed, where a suite of data were gathered (variables in Table 2) and entered into a shared database (Appendix 1). Definitions for each variable (Table 2) were articulated following a 1-week trial period and further group discussion. Variables were defined to minimize subjectivity across investigators, clarify the reporting of results, and standardize data collection.

    Scoring metric

    The scoring metric was developed to allow for mean and variation (standard deviation, SD; standard error, SE) to be calculated from each topic, and compare among topics, and answer how much variation in quality is likely to be encountered across categories of women’s health issues. We report both variation metrics as SD encompasses the variation of the data set, while SE scales for sample size variation among categorical variables. When searching topics using the same criteria:

    1. Are some topics more likely to result in results for pages with scientifically verifiable information?

    1. Does the variation of quality vary between topics?

    Peer-reviewed journal articles were included in the database if encountered in the searches but were removed before statistical analysis. The justification for removing those sources was that it is possible the Google algorithm included those sources disproportionately for our group of college students and a professor who regularly searches for academic articles. We also assume those sources are consulted less frequently by lay audiences searching for health information.

    Scores were based on six binary (presence/absence) attributes of each web page evaluated. These were: Author (name present/absent), author credentials given, reviewer, reviewer credentials, sources listed, peer-reviewed sources listed. A score of 1 was given if the attribute was present, and 0 if absent. The total number of references cited on a webpage, as well as the number of those that were peer-reviewed (Table 2) were recorded, but for scoring purposes, a 1 or 0 was assigned if there were or were not references and peer-reviewed references, respectively. Potential scores thus ranged from 0 to 6.

    We performed a simple validation experiment via anonymous surveys sent to students at our institution (New Mexico Tech), a predominantly STEM-focused public university. Using the final scores from the search result webpages, a single website from each score was selected at random using the RAND() function in Microsoft Excel to assign a random variable as an identifier to each URL, then sorting by that variable and selecting the first article in a given score category. Webpages with scores of 0 or 6 were excluded from the validation experiment. Following institutional review, a survey was sent to the “all student” email list, and recipients were directed to a web survey that asked participants to give a score of 1-5 to each of the 5 random (but previously scored) web pages, without repeating a score. Participants were given minimal information about the project and had no indication the pages had already been assigned scores. Survey results were collected anonymously by having responses routed to a spreadsheet, and no personally identifiable data were collected from participants.

    Statistical analysis

    Differences in mean scores within each health topic and the mean number of sources per evaluated webpage were evaluated by calculating Bayes Factors; response variables (mean score, number of sources) for each topic were compared to a null model of no difference across topics (y ~ category + error). Equal prior weight was given to each potential model. Variance inequality was tested via Levene’s test, and normality was assessed using quartile-quartile plots. Correlation analysis was used to test the strength of the association between individual scores per website and the number of sources cited per website. Because only the presence or absence of sources was considered in the score calculation, the number of sources is independent of score, and justifies correlation analysis. Statistical analyses were conducted in the open-source software package JASP version 0.19.2 (JASP, 2024).

  9. Search ad spend in Germany 2019-2028, by device

    • statista.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Search ad spend in Germany 2019-2028, by device [Dataset]. https://www.statista.com/statistics/456449/search-advertising-revenue-device-digital-market-outlook-germany/
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Germany
    Description

    Over the last two observations, the ad spending is forecast to significantly increase in all segments. The trend observed from 2019 to 2028 remains consistent throughout the entire forecast period. There is a continuous increase in the indicator across all segments. Notably, the Search Advertising Desktop segment achieves the highest value of **** billion U.S. dollars at 2028. The Statista Market Insights cover a broad range of additional markets.

  10. Google Trends Dataset

    • kaggle.com
    Updated Feb 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dhruvil Dave (2021). Google Trends Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/1936665
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dhruvil Dave
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This is a curated dataset of Google Trends over the years. Every year, Google releases the trending search queries all over the world in various categories. It has trends from 2001 to 2020.

    Image Credits: Unsplash - lukecheeser

  11. The language of sound search: Examining User Queries in Audio Search Engines...

    • zenodo.org
    csv, zip
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benno Weck; Benno Weck; Frederic Font; Frederic Font (2024). The language of sound search: Examining User Queries in Audio Search Engines (supplementary materials) [Dataset]. http://doi.org/10.5281/zenodo.13622537
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benno Weck; Benno Weck; Frederic Font; Frederic Font
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset accompanies the paper titled "The Language of Sound Search: Examining User Queries in Audio Search Engines." The study investigates user-generated textual queries within the context of sound search engines, which are commonly used for applications such as foley, sound effects, and general audio retrieval.

    The paper addresses the gap in current research regarding the real-world needs and behaviors of users when designing text-based audio retrieval systems. By analyzing search queries collected from two sources — a custom survey and Freesound query logs — the study provides insights into user behavior in sound search contexts. Our findings reveal that users tend to formulate longer and more detailed queries when not constrained by existing systems, and that both survey and Freesound queries are predominantly keyword-based.

    This dataset contains the raw data collected from the survey and annotations of Freesound query logs.

    Files in This Dataset

    The dataset includes the following files:

    1. participants.csv
      Contains data from the survey participants. Columns:

      • id: A unique identifier for each participant.
      • fluency: Self-reported English language proficiency.
      • experience: Whether the participant has used online sound libraries before.
      • passed_instructions: Boolean value indicating whether the participant advanced past the instructions page in the survey.
    2. annotations.csv
      Contains annotations of the survey responses, detailing the participants' interaction with the sound search tasks. Columns:

      • id: A unique identifier for each annotation.
      • participant_id: Links to the participant’s ID in participants.csv.
      • stimulus_id: Identifier for the stimulus presented to the participant (audio, image, or text description).
      • stimulus_type: The type of stimulus (audio, image, text).
      • audio_result_id: Identifier for the hypothetical audio result presented during the search task.
      • query1: Initial search query submitted based on the stimulus.
      • query2: Refined search query after seeing the hypothetical search result.
      • aspects1: Aspects considered important when formulating the initial query.
      • aspects2: Aspects considered important when refining the query.
      • result_relevance: Participant's rating of the hypothetical search result's relevance.
      • time: Time taken to complete the search task.
    3. freesound_queries_annotated.csv
      Contains annotated Freesound search queries. Columns:

      • query: Text of the search query submitted to Freesound.
      • count: The number of times the specific query was submitted.
      • topic: Annotated topic of the query, based on an ontology derived from AudioSet, with an additional category, Other, which includes non-English queries and NSFW-related content.
    4. survey_stimuli_data.zip
      This ZIP file contains three CSV files corresponding to the three stimulus types used in the survey:

      • Audio stimuli: Categorized sound recordings presented to participants.
      • Image stimuli: Annotated images that prompted sound-related queries.
      • Text stimuli: Summarized descriptions of sounds provided to participants.

    More details on the stimuli and the survey methodology can be found in the accompanying paper.

    Citation

    If you use this dataset in your research, please cite the corresponding paper:

    B. Weck and F. Font, ‘The Language of Sound Search: Examining User Queries in Audio Search Engines’, in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024), Tokyo, Japan, Oct. 2024, pp. 181–185.
    @inproceedings{Weck2024,
      author = "Weck, Benno and Font, Frederic",
      title = "The Language of Sound Search: Examining User Queries in Audio Search Engines",
      booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024)",
      address = "Tokyo, Japan",
      month = "October",
      year = "2024",
      pages = "181--185"
    }
  12. E

    VIADAT-SEARCH

    • live.european-language-grid.eu
    Updated Nov 12, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). VIADAT-SEARCH [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/18215
    Explore at:
    Dataset updated
    Nov 12, 2018
    License

    https://opensource.org/licenses/BSD-3-Clausehttps://opensource.org/licenses/BSD-3-Clause

    Description

    VIADAT-SEARCH in connection with VIADAT-REPO enables searching transcripts of oral history recordings. Language analysis has been used to preprocess the recordings, which makes it possible to search the fulltext using multiple criteria, including names, different forms of the same word etc.

    Developed in cooperation with ÚSD AV ČR and NFA.

  13. g

    KOMPAKK Index of Occupations’ Teleworkability in Germany

    • search.gesis.org
    • da-ra.de
    Updated Apr 29, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gädecke, Martin; Struffolino, Emanuela; Zagel, Hannah; Fasang, Anette (2021). KOMPAKK Index of Occupations’ Teleworkability in Germany [Dataset]. http://doi.org/10.7802/2286
    Explore at:
    Dataset updated
    Apr 29, 2021
    Dataset provided by
    GESIS, Köln
    GESIS search
    Authors
    Gädecke, Martin; Struffolino, Emanuela; Zagel, Hannah; Fasang, Anette
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Area covered
    Germany
    Description

    “Telework”, “home office” and “work from home” have recently become very prominent working concepts due to social distancing regulations during the COVID-19 pandemic. According to a study by Kohlrausch and Zucco (2020), in Germany the share of people who regularly work from home has increased from about 4% before the pandemic to approx. 20% during the first wave of the pandemic. Furthermore, the share of workers who alternate between business and home office also increased. In this development, telework was not equally distributed across all occupational and social groups. With the project “Household structures and economic risks in East and West Germany during the COVID-19 pandemic: compensation or accumulation? (KOMPAKK)” we define economic risks that people were exposed to due to the COVID-19 pandemic. We therefore calculate several risk factors based on survey data from 2017 and 2018. As some occupations might be well executed from home while others are not, we created an index which reflects the possibility of working from home.

  14. Leading search engines in Luxembourg 2018-2025, by market share

    • statista.com
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading search engines in Luxembourg 2018-2025, by market share [Dataset]. https://www.statista.com/statistics/1040208/market-shares-of-search-engines-in-luxembourg/
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2018 - Jun 2025
    Area covered
    Luxembourg
    Description

    In June 2025, Google search engine had a market share of ***** percent in Luxembourg across all devices. Bing ranked second, holding a market share of **** percent, while DuckDuckGo followed with **** percent.

  15. w

    Dataset of books called .NET framework solutions : in search of the lost...

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called .NET framework solutions : in search of the lost Win32 API [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=.NET+framework+solutions+%3A+in+search+of+the+lost+Win32+API
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is .NET framework solutions : in search of the lost Win32 API. It features 7 columns including author, publication date, language, and book publisher.

  16. SERP data from controversial queries on Google and Bing

    • zenodo.org
    bin, csv, zip
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sal Hagen; Sal Hagen; Guillén Torres; Guillén Torres (2025). SERP data from controversial queries on Google and Bing [Dataset]. http://doi.org/10.5281/zenodo.14919504
    Explore at:
    zip, bin, csvAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sal Hagen; Sal Hagen; Guillén Torres; Guillén Torres
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 24, 2024
    Description

    Data for the forthcoming publication 'Contested Components: Studying Interface Enrichment as a Form of Content Moderation on Google and Bing'.

    Datasets contain information on SERP components for Google Bing when querying 2000 controversial and 914 non-controversial questions.

    Files include:

    • question_data.csv: Information on questions sourced from 4chan and leftychan boards in November 2024. Columns include the counts per board (/fit/, /b/, /pol/. /int/, /k/, /lgbt/, and /leftypol/), categorization as controversial/non-controverial, and toxicity scores determined by Perspective API.
    • serp_components.csv: Information on the SERP data gathered using Zoekplaatje. Collected on 24 November 2024.
    • screenshots.zip: Screenshots of all SERPs. Note that at times, expanding the AI Overview box on Google resulted in the search bar overlaying the generated text.
    • component_analysis.ipynb: Code for analyzing the data.

    Component taxomony and screenshots

    <td

    Search engine

    Component name

    Count

    Example

    Google

    organic

    19,470

    Click to view

    Bing

    organic

    18,677

    Click to view

    Google

    related-questions

    1,534

    Click to view

    Bing

    related-queries

    1,425

    Click to view

    Bing

    info-card

    1,320

    Click to view

    (each card is its own info-card component)

    Google

    related-queries

    1,289

    Click to view

    Bing

    organic-answer

    1,140

    Click to view

    (often summarised through AI-assisted means)

    Bing

    video-widget

    776

    Click to view

    Bing

    organic-showcase

    752

    Click to view

    Bing

    related-questions

    725

    Click to view

    Google

    ai-overview

    499

    Click to view

    Bing

    organic-wiki-widget

    271

    Click to view

    Google

    did-you-mean

    223

    Click to view

    Bing

    related-queries-carousel

    219

    Click to view

    Bing

    info-card-image

    136

    Click to view

  17. f

    A dengue fever predicting model based on Baidu search index data and climate...

    • plos.figshare.com
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Liu; Songjing Guo; Mingjun Zou; Cong Chen; Fei Deng; Zhong Xie; Sheng Hu; Liang Wu (2023). A dengue fever predicting model based on Baidu search index data and climate data in South China [Dataset]. http://doi.org/10.1371/journal.pone.0226841
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Dan Liu; Songjing Guo; Mingjun Zou; Cong Chen; Fei Deng; Zhong Xie; Sheng Hu; Liang Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    China
    Description

    With the acceleration of global urbanization and climate change, dengue fever is spreading worldwide. Different levels of dengue fever have also occurred in China, especially in southern China, causing enormous economic losses. Unfortunately, there is no effective treatment for dengue, and the most popular dengue vaccine does not exhibit good curative effects. Therefore, we developed a Generalized Additive Mixed Model (GAMM) that gathered climate factors (mean temperature, relative humidity and precipitation) and Baidu search data during 2011–2015 in Guangzhou city to improve the accuracy of dengue fever prediction. Firstly, the time series dengue fever data were decomposed into seasonal, trend and remainder components by the seasonal-trend decomposition procedure based on loess (STL). Secondly, the time lag of variables was determined in cross-correlation analysis and the order of autocorrelation was estimated using autocorrelation (ACF) and partial autocorrelation functions (PACF). Finally, the GAMM was built and evaluated by comparing it with Generalized Additive Mode (GAM). Experimental results indicated that the GAMM (R2: 0.95 and RMSE: 34.1) has a superior prediction capability than GAM (R2: 0.86 and RMSE: 121.9). The study could help the government agencies and hospitals respond early to dengue fever outbreak.

  18. E

    Enterprise Search Platform Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Enterprise Search Platform Report [Dataset]. https://www.archivemarketresearch.com/reports/enterprise-search-platform-16147
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Feb 9, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Enterprise Search Platform market is anticipated to grow from a market size of 1676 million in 2025 to 3453 million by 2033, at a CAGR of 11.4%. The market growth is attributed to the rising demand for enterprise search platforms to enhance organizational productivity, improve customer experience, and ensure data security. The increasing adoption of cloud-based enterprise search platforms, advancements in artificial intelligence and machine learning, and the growing awareness of the benefits of enterprise search platforms in various industry verticals are driving the market growth. However, the high cost of implementation and maintenance of enterprise search platforms and concerns over data privacy and security may restrain market growth to some extent. The key market segments include cloud-based and on-premises deployment models, and applications in government & commercial offices, banking & finance, healthcare, retail, and other industries. North America and Europe are the dominant regions in the market, with Asia Pacific emerging as a high-growth region. Prominent players in the market include Yext, Elastic, Sinequa, Algolia, Hyland Software, Coveo, Accenture, Opentext, SAP AG, Oracle, Microsoft, Google, MarkLogic Inc., Lucid Work, X1 Technologies, Micro Focus, Baidu, Udesk, Data Grand, Giantan, and XD Tech.

  19. Market share of search engines in Indonesia 2024

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Market share of search engines in Indonesia 2024 [Dataset]. https://www.statista.com/statistics/954420/indonesia-market-share-of-search-engines/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 2024
    Area covered
    Indonesia
    Description

    As of January 2024, Google led the search engine market in Indonesia with a ***** percent share of the market. In the same year, Bing and Yahoo! followed with minor market shares.

  20. Most common YouTube search queries Japan 2024, based on index score

    • statista.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Most common YouTube search queries Japan 2024, based on index score [Dataset]. https://www.statista.com/statistics/1136581/japan-most-common-youtube-search-queries-based-on-index-score/
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 1, 2024 - Dec 31, 2024
    Area covered
    Japan
    Description

    "Song" was the leading YouTube query in Japan in 2024. As the top query, it received an index score of 100 points. The word "game" ranked second with an index score of ** points, meaning that it received ** percent of the search volume of the top query.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). Job Offers Web Scraping Search [Dataset]. https://www.kaggle.com/datasets/thedevastator/job-offers-web-scraping-search
Organization logo

Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

By [source]

About this dataset

This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:

  • Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.

  • Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!

  • Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!

  • Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!

    All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!

Research Ideas

  • Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.
  • The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.
  • It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .

Search
Clear search
Close search
Google apps
Main menu