42 datasets found

Job Offers Web Scraping Search
kaggle.com
Updated Feb 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Job Offers Web Scraping Search [Dataset]. https://www.kaggle.com/datasets/thedevastator/job-offers-web-scraping-search
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

By [source]

About this dataset

This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:

Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.

Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!

Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!

Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!

All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!

Research Ideas

Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.

The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.

It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
Search Engines in the US - Market Research Report (2015-2030)
ibisworld.com
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBISWorld (2025). Search Engines in the US - Market Research Report (2015-2030) [Dataset]. https://www.ibisworld.com/united-states/market-research-reports/search-engines-industry/
Explore at:
Dataset updated
Jul 15, 2025
Dataset authored and provided by
IBISWorld
License
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
Time period covered
2015 - 2030
Area covered
United States
Description
Search engines, which collect, organize and display knowledge of the internet, remain central to the digital economy but are entering a period of rapid transformation driven by AI and shifting user behavior. Over the past five years, internet advertising spending maintained strong momentum, propelled by growing mobile internet access and consumer screen time. Consequently, industry revenue is expected to climb at a CAGR of 9.4% to $316.8 billion, including an anticipated rise of 7.7% in 2025, with profit at 18.6%. The industry stands apart from most in the tech sector, because of its platform-based revenue model, aggregation dynamics and deep integration with the broader digital ecosystem. While user engagement fuels relevance, it is advertiser demand that sustains revenue, requiring a careful balance between utility and monetization. This landscape has been reshaped by the rise of generative AI. Conversational tools and AI-generated summaries are reducing user interaction with traditional search results, challenging established SEO practices and disrupting referral-based traffic flows. Meanwhile, search engines are reconfiguring their ad models to prioritize quality and contextual relevance, moving away from legacy monetization strategies. These trends signal a broader shift in how search platforms operate, less as navigational tools and more as integrated, AI-driven environments. As digital behavior fragments and users seek information across apps like Amazon, TikTok and ChatGPT, industry revenue is still projected to climb at a CAGR of 7.3% to $449.9 billion through 2030. Advertisers are expected to continue investing in search, drawn by the format’s performance insights and optimization capabilities. However, AI is redefining search from a navigational tool into a task-oriented solution engine, where users expect conversational, multimodal and predictive answers instead of traditional results pages. To stay relevant, incumbent platforms must evolve into embedded AI utilities that power experiences across devices and enterprise workflows.
Global search volume for "AI" keyword 2022-2023
statista.com
Updated May 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global search volume for "AI" keyword 2022-2023 [Dataset]. https://www.statista.com/statistics/1398211/ai-keyword-traffic-volume/
Explore at:
Dataset updated
May 23, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 2022 - Mar 2023
Area covered
Worldwide
Description
Between June 2022 and March 2023, the traffic volume for the keyword "AI" has tripled, going from around 7.9 million monthly searches to more than 30.4 million during the last month of the measured period. General interest in artificial intelligence (AI) has exploded in markets like the United States by the end of 2022. Likewise, interest for the application programming interfaces (API's) and plugins of artificial intelligence solutions, especially those of ChatGPT, has also seen a major increase since the release of the tool in November of 2022.

The artificial intelligence market

Valued at around 142.3 billion U.S. dollars in 2022, the artificial intelligence market is one the most promising tech segments for the rest of the decade, with more than five billion U.S. dollars invested in startups - the most notable being the Californian company OpenAI and its flagship application ChatGPT. Disruptive as it is, the adoption of AI has already sparked an alert for several industries, likely to affect job markets and thus raising concerns about cybercrime and other online misdeeds.

The future of online search?

Of most industries, the impact of the new tool developed by OpenAI may be felt by the online search market like a global earthquake. With chatbots providing search results in a dialogue format, the trend of AI-powered search engines unleashed by ChatGPT threw giant companies like Google and Microsoft into a race with startups and other competitors to present the best candidate for this disruptive (and experimental) online solution.
d
Corporations Search (Washington state)
catalog.data.gov
data.wa.gov
+1more
Updated Sep 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.wa.gov (2024). Corporations Search (Washington state) [Dataset]. https://catalog.data.gov/dataset/corporations-search-from-secretary-of-state
Explore at:
Dataset updated
Sep 6, 2024
Dataset provided by
data.wa.gov
Area covered
Washington
Description
This provides a link to the Washington Secretary of State's Corporations Search tool. The Corporations Data Extract feature is no longer available. Customers needing a list of multiple businesses can use our advanced search to create a list of businesses under specific parameters. You can export this information to an Excel spreadsheet to sort and search more extensively. Below are the steps to perform this type of search. The more specified parameter searches provide narrower search results. Please visit our Corporations and Charities Filing System by following this link https://ccfs.sos.wa.gov/ Scroll down to the “Corporation Search” section and click the “Advanced Search” button on the right. Under the first section, specify how you would like the business name searched. Only use this for single business lookups unless all the businesses you are searching have a common name (use the “contains” selection). Select the appropriate business type from the dropdown if you are looking for a list of a specific business type. For a list of a particular business type with a specific status, select that status under “Business Status.” You can also search by expiration date in this section. Under the “Date of Incorporation/Formation/Registration,” you can search by start or end date. Under the “Registered Agent/Governor Search” section, you can search all businesses with the same registered agent on record or governor listed. Once you have made all your search selections, click the green “Search” button at the bottom right of the page. A list will populate; scroll to the bottom and select the green Excel document icon with CSV. An Excel document should automatically download. If you have popups blocked, please unblock our site, and try again. Once you have opened the downloaded Excel spreadsheet, you can adjust the width of each column and sort the data using the data tab. You can also search by pressing CTRL+F on a Windows keyboard.
Z
Data for study "Direct Answers in Google Search Results"
data.niaid.nih.gov
zenodo.org
Updated Jun 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rutecka, Paulina (2020). Data for study "Direct Answers in Google Search Results" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3541091
Explore at:
Dataset updated
Jun 9, 2020
Dataset provided by
Strzelecki, Artur
Rutecka, Paulina
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The goal of this research is to examine direct answers in Google web search engine. Dataset was collected using Senuto (https://www.senuto.com/). Senuto is as an online tool, that extracts data on websites visibility from Google search engine.

Dataset contains the following elements:

keyword,

number of monthly searches,

featured domain,

featured main domain,

featured position,

featured type,

featured url,

content,

content length.

Dataset with visibility structure has 743 798 keywords that were resulting in SERPs with direct answer.
d
Data from: Efficient Keyword-Based Search for Top-K Cells in Text Cube
catalog.data.gov
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Efficient Keyword-Based Search for Top-K Cells in Text Cube [Dataset]. https://catalog.data.gov/dataset/efficient-keyword-based-search-for-top-k-cells-in-text-cube
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked-structures (e.g.,a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for coring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches, inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches. Citation: B. Ding, B. Zhao, C. X. Lin, J. Han, C. Zhai, A. N. Srivastava, and N. C. Oza, “Efficient Keyword-Based Search for Top-K Cells in Text Cube,” IEEE Transactions on Knowledge and Data Engineering, 2011.
IM3 GO WEST Parameter Search Dataset
osti.gov
Updated Feb 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MultiSector Dynamics - Living, Intuitive, Value-adding, Environment (2023). IM3 GO WEST Parameter Search Dataset [Dataset]. http://doi.org/10.57931/1923267
Explore at:
Unique identifier
https://doi.org/10.57931/1923267
Dataset updated
Feb 4, 2023
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
MultiSector Dynamics - Living, Intuitive, Value-adding, Environment
Description
GO WEST is an open-source power grid modeling framework for U.S. Western Interconnection, which allows users to tailor the model depending on their research study and science questions. It is developed to address weather and water dynamics, and associated vulnerabilities in this bulk power system. It covers 28 balancing authorities (BA) and 12 states in U.S. Western Interconnection. GO WEST allows users to select different number of nodes and come up with a simplified network by utilizing 10,000 nodal topology of U.S. Western Interconnection created by Texas A&M University. Users can try and select different number of nodes, mathematical formulations (linear programming vs. mixed-integer linear programming), transmission line limit scaling factors, and hurdle rate scaling factors. GO WEST offers a unit commitment and economic dispatch (UC/ED) module to simulate grid operations on an hourly scale. In this sense, users can calibrate and validate their model versions by comparing model outputs to historical datasets. Therefore, GO WEST can help researchers to strike a balance between model fidelity (i.e. accuracy) and computational complexity (i.e. runtime). This dataset includes model inputs and outputs from 600 model versions for each 2019, 2020, and 2021. The folder naming convention is as follows: Exp{Number of Nodes}_{Mathematical Formulation}_{Transmission Line Limit Scaling Factor in MW}_{Hurdle Rate Scaling Factor in %}_{Year}. Linear programming is designated with "simple" label whereas mixed-integer linear programming is designated with "coal" label. For example, "Exp100_simple_1000_50_2019" folder contains inputs and outputs from 2019 model version with 100 nodes, linear programming, +1000 MW transmission line limit scaling factor, and +50% hurdle rate scaling factor. GO WEST GitHub repository hosts all raw datasets, processing scripts, and model scripts. Please refer to the README file for a detailed description of the included files.
Data from: Variation in quality of women's health topic information from...
zenodo.org
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Duval; Benjamin Duval (2025). Variation in quality of women's health topic information from systematic internet searches [Dataset]. http://doi.org/10.5281/zenodo.15839790
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15839790
Dataset updated
Jul 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Duval; Benjamin Duval
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
METHODS

Topic determination

The project was developed as a team science exercise during a course on Nutrient Biology (New Mexico Institute of Mining and Technology, New Mexico, USA; BIOL 4089/5089). Students were all women pursuing degrees in Biology and Earth Science, with extensive internet search acumen developed from coursework and personal experience. We (students and professor) devoted ~5 hours to discussing women’s health topics prior to searching, defining search criteria, and developing a scoring system. These discussions led to a list of 12, non-cancer health topics particular to women’s health associated with human cis-gender female biology. Considerations of transgender health were discussed, with the consensus decision that those issues are scientifically relevant but deserving of a separate analysis not included here.

Search protocol

After agreeing on search terms, we experimented with settings in the Advanced Search feature in Google (www.google.com), and collectively agreed to the following settings: Language (English); search terms appearing in the “text” of the page; ANY of the terms “woman”, “women” ,“female”; ALL terms when using a single topic from list above with the addition of the word “nutrient”. Figure 1 shows a screenshot for how a search was conducted for endometriosis as an example. To standardize data collection among investigators, all results from the first 5 pages of results were collected. Search result URLs were followed, where a suite of data were gathered (variables in Table 2) and entered into a shared database (Appendix 1). Definitions for each variable (Table 2) were articulated following a 1-week trial period and further group discussion. Variables were defined to minimize subjectivity across investigators, clarify the reporting of results, and standardize data collection.

Scoring metric

The scoring metric was developed to allow for mean and variation (standard deviation, SD; standard error, SE) to be calculated from each topic, and compare among topics, and answer how much variation in quality is likely to be encountered across categories of women’s health issues. We report both variation metrics as SD encompasses the variation of the data set, while SE scales for sample size variation among categorical variables. When searching topics using the same criteria:

Are some topics more likely to result in results for pages with scientifically verifiable information?

Does the variation of quality vary between topics?

Peer-reviewed journal articles were included in the database if encountered in the searches but were removed before statistical analysis. The justification for removing those sources was that it is possible the Google algorithm included those sources disproportionately for our group of college students and a professor who regularly searches for academic articles. We also assume those sources are consulted less frequently by lay audiences searching for health information.

Scores were based on six binary (presence/absence) attributes of each web page evaluated. These were: Author (name present/absent), author credentials given, reviewer, reviewer credentials, sources listed, peer-reviewed sources listed. A score of 1 was given if the attribute was present, and 0 if absent. The total number of references cited on a webpage, as well as the number of those that were peer-reviewed (Table 2) were recorded, but for scoring purposes, a 1 or 0 was assigned if there were or were not references and peer-reviewed references, respectively. Potential scores thus ranged from 0 to 6.

We performed a simple validation experiment via anonymous surveys sent to students at our institution (New Mexico Tech), a predominantly STEM-focused public university. Using the final scores from the search result webpages, a single website from each score was selected at random using the RAND() function in Microsoft Excel to assign a random variable as an identifier to each URL, then sorting by that variable and selecting the first article in a given score category. Webpages with scores of 0 or 6 were excluded from the validation experiment. Following institutional review, a survey was sent to the “all student” email list, and recipients were directed to a web survey that asked participants to give a score of 1-5 to each of the 5 random (but previously scored) web pages, without repeating a score. Participants were given minimal information about the project and had no indication the pages had already been assigned scores. Survey results were collected anonymously by having responses routed to a spreadsheet, and no personally identifiable data were collected from participants.

Statistical analysis

Differences in mean scores within each health topic and the mean number of sources per evaluated webpage were evaluated by calculating Bayes Factors; response variables (mean score, number of sources) for each topic were compared to a null model of no difference across topics (y ~ category + error). Equal prior weight was given to each potential model. Variance inequality was tested via Levene’s test, and normality was assessed using quartile-quartile plots. Correlation analysis was used to test the strength of the association between individual scores per website and the number of sources cited per website. Because only the presence or absence of sources was considered in the score calculation, the number of sources is independent of score, and justifies correlation analysis. Statistical analyses were conducted in the open-source software package JASP version 0.19.2 (JASP, 2024).
Search ad spend in Germany 2019-2028, by device
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Search ad spend in Germany 2019-2028, by device [Dataset]. https://www.statista.com/statistics/456449/search-advertising-revenue-device-digital-market-outlook-germany/
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Germany
Description
Over the last two observations, the ad spending is forecast to significantly increase in all segments. The trend observed from 2019 to 2028 remains consistent throughout the entire forecast period. There is a continuous increase in the indicator across all segments. Notably, the Search Advertising Desktop segment achieves the highest value of **** billion U.S. dollars at 2028. The Statista Market Insights cover a broad range of additional markets.
Google Trends Dataset
kaggle.com
Updated Feb 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dhruvil Dave (2021). Google Trends Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/1936665
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/1936665
Dataset updated
Feb 13, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dhruvil Dave
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
This is a curated dataset of Google Trends over the years. Every year, Google releases the trending search queries all over the world in various categories. It has trends from 2001 to 2020.

Image Credits: Unsplash - lukecheeser
The language of sound search: Examining User Queries in Audio Search Engines...
zenodo.org
csv, zip
Updated Oct 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benno Weck; Benno Weck; Frederic Font; Frederic Font (2024). The language of sound search: Examining User Queries in Audio Search Engines (supplementary materials) [Dataset]. http://doi.org/10.5281/zenodo.13622537
Explore at:
csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13622537
Dataset updated
Oct 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benno Weck; Benno Weck; Frederic Font; Frederic Font
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

This dataset accompanies the paper titled "The Language of Sound Search: Examining User Queries in Audio Search Engines." The study investigates user-generated textual queries within the context of sound search engines, which are commonly used for applications such as foley, sound effects, and general audio retrieval.

The paper addresses the gap in current research regarding the real-world needs and behaviors of users when designing text-based audio retrieval systems. By analyzing search queries collected from two sources — a custom survey and Freesound query logs — the study provides insights into user behavior in sound search contexts. Our findings reveal that users tend to formulate longer and more detailed queries when not constrained by existing systems, and that both survey and Freesound queries are predominantly keyword-based.

This dataset contains the raw data collected from the survey and annotations of Freesound query logs.

Files in This Dataset

The dataset includes the following files:

participants.csv
Contains data from the survey participants. Columns:

id: A unique identifier for each participant.

fluency: Self-reported English language proficiency.

experience: Whether the participant has used online sound libraries before.

passed_instructions: Boolean value indicating whether the participant advanced past the instructions page in the survey.

annotations.csv
Contains annotations of the survey responses, detailing the participants' interaction with the sound search tasks. Columns:

id: A unique identifier for each annotation.

participant_id: Links to the participant’s ID in participants.csv.

stimulus_id: Identifier for the stimulus presented to the participant (audio, image, or text description).

stimulus_type: The type of stimulus (audio, image, text).

audio_result_id: Identifier for the hypothetical audio result presented during the search task.

query1: Initial search query submitted based on the stimulus.

query2: Refined search query after seeing the hypothetical search result.

aspects1: Aspects considered important when formulating the initial query.

aspects2: Aspects considered important when refining the query.

result_relevance: Participant's rating of the hypothetical search result's relevance.

time: Time taken to complete the search task.

freesound_queries_annotated.csv
Contains annotated Freesound search queries. Columns:

query: Text of the search query submitted to Freesound.

count: The number of times the specific query was submitted.

topic: Annotated topic of the query, based on an ontology derived from AudioSet, with an additional category, Other, which includes non-English queries and NSFW-related content.

survey_stimuli_data.zip
This ZIP file contains three CSV files corresponding to the three stimulus types used in the survey:

Audio stimuli: Categorized sound recordings presented to participants.

Image stimuli: Annotated images that prompted sound-related queries.

Text stimuli: Summarized descriptions of sounds provided to participants.

More details on the stimuli and the survey methodology can be found in the accompanying paper.

Citation

If you use this dataset in your research, please cite the corresponding paper:

B. Weck and F. Font, ‘The Language of Sound Search: Examining User Queries in Audio Search Engines’, in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024), Tokyo, Japan, Oct. 2024, pp. 181–185.

@inproceedings{Weck2024, author = "Weck, Benno and Font, Frederic", title = "The Language of Sound Search: Examining User Queries in Audio Search Engines", booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE2024)", address = "Tokyo, Japan", month = "October", year = "2024", pages = "181--185" }
E
VIADAT-SEARCH
live.european-language-grid.eu
Updated Nov 12, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). VIADAT-SEARCH [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/18215
Explore at:
Dataset updated
Nov 12, 2018
License
https://opensource.org/licenses/BSD-3-Clausehttps://opensource.org/licenses/BSD-3-Clause
Description
VIADAT-SEARCH in connection with VIADAT-REPO enables searching transcripts of oral history recordings. Language analysis has been used to preprocess the recordings, which makes it possible to search the fulltext using multiple criteria, including names, different forms of the same word etc.

Developed in cooperation with ÚSD AV ČR and NFA.
g
KOMPAKK Index of Occupations’ Teleworkability in Germany
search.gesis.org
da-ra.de
Updated Apr 29, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gädecke, Martin; Struffolino, Emanuela; Zagel, Hannah; Fasang, Anette (2021). KOMPAKK Index of Occupations’ Teleworkability in Germany [Dataset]. http://doi.org/10.7802/2286
Explore at:
Unique identifier
https://doi.org/10.7802/2286
Dataset updated
Apr 29, 2021
Dataset provided by
GESIS, Köln
GESIS search
Authors
Gädecke, Martin; Struffolino, Emanuela; Zagel, Hannah; Fasang, Anette
License
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
Area covered
Germany
Description
“Telework”, “home office” and “work from home” have recently become very prominent working concepts due to social distancing regulations during the COVID-19 pandemic. According to a study by Kohlrausch and Zucco (2020), in Germany the share of people who regularly work from home has increased from about 4% before the pandemic to approx. 20% during the first wave of the pandemic. Furthermore, the share of workers who alternate between business and home office also increased. In this development, telework was not equally distributed across all occupational and social groups. With the project “Household structures and economic risks in East and West Germany during the COVID-19 pandemic: compensation or accumulation? (KOMPAKK)” we define economic risks that people were exposed to due to the COVID-19 pandemic. We therefore calculate several risk factors based on survey data from 2017 and 2018. As some occupations might be well executed from home while others are not, we created an index which reflects the possibility of working from home.
Leading search engines in Luxembourg 2018-2025, by market share
statista.com
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Leading search engines in Luxembourg 2018-2025, by market share [Dataset]. https://www.statista.com/statistics/1040208/market-shares-of-search-engines-in-luxembourg/
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2018 - Jun 2025
Area covered
Luxembourg
Description
In June 2025, Google search engine had a market share of ***** percent in Luxembourg across all devices. Bing ranked second, holding a market share of **** percent, while DuckDuckGo followed with **** percent.
w
Dataset of books called .NET framework solutions : in search of the lost...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called .NET framework solutions : in search of the lost Win32 API [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=.NET+framework+solutions+%3A+in+search+of+the+lost+Win32+API
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is .NET framework solutions : in search of the lost Win32 API. It features 7 columns including author, publication date, language, and book publisher.

SERP data from controversial queries on Google and Bing

zenodo.org

bin, csv, zip

Updated Apr 28, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Sal Hagen; Sal Hagen; Guillén Torres; Guillén Torres (2025). SERP data from controversial queries on Google and Bing [Dataset]. http://doi.org/10.5281/zenodo.14919504

Explore at:

zip, bin, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.14919504

Dataset updated

Apr 28, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Sal Hagen; Sal Hagen; Guillén Torres; Guillén Torres

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Nov 24, 2024

Description

Data for the forthcoming publication 'Contested Components: Studying Interface Enrichment as a Form of Content Moderation on Google and Bing'.

Datasets contain information on SERP components for Google Bing when querying 2000 controversial and 914 non-controversial questions.

Files include:

question_data.csv: Information on questions sourced from 4chan and leftychan boards in November 2024. Columns include the counts per board (/fit/, /b/, /pol/. /int/, /k/, /lgbt/, and /leftypol/), categorization as controversial/non-controverial, and toxicity scores determined by Perspective API.
serp_components.csv: Information on the SERP data gathered using Zoekplaatje. Collected on 24 November 2024.
screenshots.zip: Screenshots of all SERPs. Note that at times, expanding the AI Overview box on Google resulted in the search bar overlaying the generated text.
component_analysis.ipynb: Code for analyzing the data.

Component taxomony and screenshots

<td

Search engine	Component name	Count	Example
Google	organic	19,470	Click to view
Bing	organic	18,677	Click to view
Google	related-questions	1,534	Click to view
Bing	related-queries	1,425	Click to view
Bing	info-card	1,320	Click to view (each card is its own info-card component)
Google	related-queries	1,289	Click to view
Bing	organic-answer	1,140	Click to view (often summarised through AI-assisted means)
Bing	video-widget	776	Click to view
Bing	organic-showcase	752	Click to view
Bing	related-questions	725	Click to view
Google	ai-overview	499	Click to view
Bing	organic-wiki-widget	271	Click to view
Google	did-you-mean	223	Click to view
Bing	related-queries-carousel	219	Click to view
Bing	info-card-image	136	Click to view

f
A dengue fever predicting model based on Baidu search index data and climate...
plos.figshare.com
xlsx
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan Liu; Songjing Guo; Mingjun Zou; Cong Chen; Fei Deng; Zhong Xie; Sheng Hu; Liang Wu (2023). A dengue fever predicting model based on Baidu search index data and climate data in South China [Dataset]. http://doi.org/10.1371/journal.pone.0226841
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0226841
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Dan Liu; Songjing Guo; Mingjun Zou; Cong Chen; Fei Deng; Zhong Xie; Sheng Hu; Liang Wu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
China
Description
With the acceleration of global urbanization and climate change, dengue fever is spreading worldwide. Different levels of dengue fever have also occurred in China, especially in southern China, causing enormous economic losses. Unfortunately, there is no effective treatment for dengue, and the most popular dengue vaccine does not exhibit good curative effects. Therefore, we developed a Generalized Additive Mixed Model (GAMM) that gathered climate factors (mean temperature, relative humidity and precipitation) and Baidu search data during 2011–2015 in Guangzhou city to improve the accuracy of dengue fever prediction. Firstly, the time series dengue fever data were decomposed into seasonal, trend and remainder components by the seasonal-trend decomposition procedure based on loess (STL). Secondly, the time lag of variables was determined in cross-correlation analysis and the order of autocorrelation was estimated using autocorrelation (ACF) and partial autocorrelation functions (PACF). Finally, the GAMM was built and evaluated by comparing it with Generalized Additive Mode (GAM). Experimental results indicated that the GAMM (R2: 0.95 and RMSE: 34.1) has a superior prediction capability than GAM (R2: 0.86 and RMSE: 121.9). The study could help the government agencies and hospitals respond early to dengue fever outbreak.
E
Enterprise Search Platform Report
archivemarketresearch.com
doc, pdf, ppt
Updated Feb 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Enterprise Search Platform Report [Dataset]. https://www.archivemarketresearch.com/reports/enterprise-search-platform-16147
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Feb 9, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Enterprise Search Platform market is anticipated to grow from a market size of 1676 million in 2025 to 3453 million by 2033, at a CAGR of 11.4%. The market growth is attributed to the rising demand for enterprise search platforms to enhance organizational productivity, improve customer experience, and ensure data security. The increasing adoption of cloud-based enterprise search platforms, advancements in artificial intelligence and machine learning, and the growing awareness of the benefits of enterprise search platforms in various industry verticals are driving the market growth. However, the high cost of implementation and maintenance of enterprise search platforms and concerns over data privacy and security may restrain market growth to some extent. The key market segments include cloud-based and on-premises deployment models, and applications in government & commercial offices, banking & finance, healthcare, retail, and other industries. North America and Europe are the dominant regions in the market, with Asia Pacific emerging as a high-growth region. Prominent players in the market include Yext, Elastic, Sinequa, Algolia, Hyland Software, Coveo, Accenture, Opentext, SAP AG, Oracle, Microsoft, Google, MarkLogic Inc., Lucid Work, X1 Technologies, Micro Focus, Baidu, Udesk, Data Grand, Giantan, and XD Tech.
Market share of search engines in Indonesia 2024
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Market share of search engines in Indonesia 2024 [Dataset]. https://www.statista.com/statistics/954420/indonesia-market-share-of-search-engines/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 2024
Area covered
Indonesia
Description
As of January 2024, Google led the search engine market in Indonesia with a ***** percent share of the market. In the same year, Bing and Yahoo! followed with minor market shares.
Most common YouTube search queries Japan 2024, based on index score
statista.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Most common YouTube search queries Japan 2024, based on index score [Dataset]. https://www.statista.com/statistics/1136581/japan-most-common-youtube-search-queries-based-on-index-score/
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jan 1, 2024 - Dec 31, 2024
Area covered
Japan
Description
"Song" was the leading YouTube query in Japan in 2024. As the top query, it received an index score of 100 points. The word "game" ranked second with an index score of ** points, meaning that it received ** percent of the search volume of the top query.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2023). Job Offers Web Scraping Search [Dataset]. https://www.kaggle.com/datasets/thedevastator/job-offers-web-scraping-search

Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 11, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

By [source]

About this dataset

This dataset collects job offers from web scraping which are filtered according to specific keywords, locations and times. This data gives users rich and precise search capabilities to uncover the best working solution for them. With the information collected, users can explore options that match with their personal situation, skillset and preferences in terms of location and schedule. The columns provide detailed information around job titles, employer names, locations, time frames as well as other necessary parameters so you can make a smart choice for your next career opportunity

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is a great resource for those looking to find an optimal work solution based on keywords, location and time parameters. With this information, users can quickly and easily search through job offers that best fit their needs. Here are some tips on how to use this dataset to its fullest potential:

Start by identifying what type of job offer you want to find. The keyword column will help you narrow down your search by allowing you to search for job postings that contain the word or phrase you are looking for.

Next, consider where the job is located – the Location column tells you where in the world each posting is from so make sure it’s somewhere that suits your needs!

Finally, consider when the position is available – look at the Time frame column which gives an indication of when each posting was made as well as if it’s a full-time/ part-time role or even if it’s a casual/temporary position from day one so make sure it meets your requirements first before applying!

Additionally, if details such as hours per week or further schedule information are important criteria then there is also info provided under Horari and Temps Oferta columns too! Now that all three criteria have been ticked off - key words, location and time frame - then take a look at Empresa (Company Name) and Nom_Oferta (Post Name) columns too in order to get an idea of who will be employing you should you land the gig!

All these pieces of data put together should give any motivated individual all they need in order to seek out an optimal work solution - keep hunting good luck!

Research Ideas

Machine learning can be used to groups job offers in order to facilitate the identification of similarities and differences between them. This could allow users to specifically target their search for a work solution.

The data can be used to compare job offerings across different areas or types of jobs, enabling users to make better informed decisions in terms of their career options and goals.

It may also provide an insight into the local job market, enabling companies and employers to identify where there is potential for new opportunities or possible trends that simply may have previously gone unnoticed

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: web_scraping_information_offers.csv | Column name | Description | |:-----------------|:------------------------------------| | Nom_Oferta | Name of the job offer. (String) | | Empresa | Company offering the job. (String) | | Ubicació | Location of the job offer. (String) | | Temps_Oferta | Time of the job offer. (String) | | Horari | Schedule of the job offer. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .

Clear search

Close search

Google apps

Main menu

Job Offers Web Scraping Search

Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Search Engines in the US - Market Research Report (2015-2030)

Global search volume for "AI" keyword 2022-2023

Corporations Search (Washington state)

Data for study "Direct Answers in Google Search Results"

Data from: Efficient Keyword-Based Search for Top-K Cells in Text Cube

IM3 GO WEST Parameter Search Dataset

Data from: Variation in quality of women's health topic information from...

Search ad spend in Germany 2019-2028, by device

Google Trends Dataset

The language of sound search: Examining User Queries in Audio Search Engines...

Overview

Files in This Dataset

Citation

VIADAT-SEARCH

KOMPAKK Index of Occupations’ Teleworkability in Germany

Leading search engines in Luxembourg 2018-2025, by market share

Dataset of books called .NET framework solutions : in search of the lost...

SERP data from controversial queries on Google and Bing

Component taxomony and screenshots

A dengue fever predicting model based on Baidu search index data and climate...

Enterprise Search Platform Report

Market share of search engines in Indonesia 2024

Most common YouTube search queries Japan 2024, based on index score

Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

Job Offers Web Scraping Search

Targeted Results to Find the Optimal Work Solution

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements